Feature chill

Not a whole lot remains until the release for CorgiDS. In fact, I plan to release it relatively soon for the curious.

I’ve stopped adding on to the emulator itself, and instead, I’ve been focusing on optimization. I’ve separated the GUI and emulation into separate threads, which has produced a massive speedup. Every 2D game now goes well above 60 FPS, and 3D games now reach a playable state. I don’t have a good way to actually measure FPS yet as the emulation thread has no frame limiter, but that’s on the list of tasks I’m performing. Some code cleanup has also helped with speed, but nothing drastic.

Unfortunately, the code cleanup has given birth to another stupid bug. Tetris DS is overflowing the position matrix stack on the GPU when it wasn’t doing that before. I can’t just ignore it because it produces a hideous flicker effect. I think the problem lies in the CPU, but I’m still hunting for it. That needs to get fixed as I don’t know if any other games are affected.

What else? I’ve added a framework for HLE BIOS emulation: in the future, this means you won’t have to dump the BIOS and firmware from a DS in order to play games on CorgiDS. While it has a few functions already implemented, I’m keeping it turned off for v0.1 so that it can’t be mis-used. I’m also trying to figure out a way to handle fatal errors in the course of emulation without having to crash the program… it’s difficult to figure out something that keeps the emulator core and frontend decoupled. I still need to add a window for configuring the save size of a game as well. Last but not least, I need to upload the source code to GitHub and find someone to help create builds outside of OS X.

It’s not quite a feature freeze, but things are cooling down for the holiday season. 🙂



Holiday plans

Happy December to all of you reading this!

At the moment, I’m taking a break from CorgiDS… by working on another emulator project called DaneBoy (named after the Great Dane). Don’t fear though; I’m not doing this merely for diversion, but rather, for practice.

The UI for CorgiDS is written using Qt. While Qt makes C++ UI development easy, it does so by making itself extremely bloated. One effect of this is that Qt does a lot of work on the main thread, which is not good for emulators and any other projects that require lots of processing time. Since I’d rather not switch UI libraries, the solution is, of course, moving emulation over to a separate thread. Assuming negligible overhead from thread synchronization once a frame, this would boost FPS by 20-100%, depending on the game.

The catch? I’m not privy to multithreading, so I’m afraid to touch the CorgiDS codebase and risk introducing nasty bugs due to my lack of knowledge. Instead, DaneBoy will bear the brunt of my learning experiences. I’ll incorporate what I’ve learned into CorgiDS once I believe I’m not going to deadlock everything.

Before I started DaneBoy, I also created a little program that automatically generates a jump table for ARMv5 instructions. Unlike the Z80 in the Game Boy, ARM CPUs aren’t easily decoded using a giant switch block. The two options are either using an ugly mess of if-statements or creating a jump table with 4096 separate elements. CorgiDS used the former for a while, but I finally got around to rewriting that behemoth of the codebase a couple of days ago. I don’t know of the existence of any similar decoding programs, but if there aren’t any, I’ll release mine along with CorgiDS. This is so that anyone else developing an emulator that uses an ARM CPU can benefit from this.

I don’t plan on doing anything noteworthy with DaneBoy, nor do I even want to release it. This is just a practice project; CorgiDS is what I’m really concerned about. Either way, it’s actually pretty fun to work on a much simpler system than the DS. 🙂

Ignorance, or Why You Should Research Whatever You’re Emulating Before You Start Coding

Apologies for the radio silence over the past couple of weeks. To summarize, I’ve run into a nasty problem that will require me to rewrite a significant portion of the 3D code in CorgiDS, and I’m trying to figure out how I want to do that.

When I started writing code for the DS GPU, I had no clue at all what I was doing. I had never worked with a 3D graphics library like OpenGL before, yet here I was attempting to create a software renderer in an emulator. After proverbially bashing my head against the keyboard over the course of several weeks, I finally started getting tangible results: getting polygons on the screen at all was cause for celebration. I gradually added more features such as textures, alpha blending, and lighting over time. Finally, after many, many weeks of work, CorgiDS almost meets my standards for a software renderer.


Take a look at this image from Final Fantasy IV in CorgiDS:


There are quite a few issues in this image, but let’s focus on one in particular: half of Cecil’s body on the left is entirely missing. If you look carefully, some polygons on the soldiers’ lower bodies are also missing.

What’s happening? I can’t conclusively say what the problem is, but I’ve managed to narrow it down to something going wrong with the clipping code. Clipping is how GPUs deal with polygons that extend beyond at least one of six planes of the viewing frustum: left, right, top, bottom, near, and far. The GPU “clips” the vertex outside of the frustum and creates two new vertices that lie directly on the plane.

The polygons are missing because one of their vertices is getting clipped on the far plane (the direction away from the camera), and the game is set to not render any polygons intersecting the far plane. I don’t exactly know why this is happening, but I have two thoughts:

  • A precision error causes vertices that should not be clipped to become clipped.
  • Vertices are being clipped multiple times.

I ruled out the first one, as there doesn’t seem to be anything wrong with my matrix multiplication code. Furthermore, the DS uses fixed-point arithmetic rather than floating-point, so a precision error is far less likely. That brings us to the second thought.

Because of my unfamiliarity with 3D graphics, I have used melonDS’s software renderer as a reference for creating my own. Out of a desire to learn things on my own and not outright copy someone else’s work, I have added parts bit-by-bit to my code. This organic process has led to CorgiDS’s software renderer being the messiest part of the codebase, founded upon faulty assumptions and unclear ideas. While it all remains solvable, there is one fundamental issue: CorgiDS does not re-use vertices in polygon strips.

The code for polygons so far looks like this:

Screen Shot 2017-11-26 at 1.13.16 PM

Note the “vert_index” variable, which points to the first vertex used. This code makes the assumption that vertices remain contiguous within RAM. While true for 90% of cases, this completely falls apart when polygon strips are involved. The DS GPU can allow two polygons to re-use the same vertices under special conditions, meaning that vertex lists no longer become contiguous. melonDS indicates that re-used vertices don’t get clipped again, but there’s other rules that I don’t quite understand…

Anyway, if this truly is the problem (and I don’t know what else could be), then a large portion of the renderer will need to be rewritten, a task that has little appeal to me. I might just ignore this problem entirely for the first release and focus my efforts elsewhere… not a whole lot of games are affected by this. Decisions, decisions…

Legend of FIFOIRQ and saving part 2

I figured out why none of the other 3D games were working, and it was a stupid bug.

As mentioned before, the GPU has a FIFO that holds commands waiting to be executed. The GPU can send an interrupt request to the ARM9 depending on whether the FIFO is empty or half-empty. However… a peculiar feature of this FIFOIRQ is that the bit associated with its request will always remain set as long as the condition is true, even if the ARM9 tries to clear that register. Because I wasn’t emulating this, games would get stuck in infinite loops waiting for a FIFOIRQ that would never come. I didn’t catch this problem before for the following two reasons:

  • Super Mario 64 DS, the only game that worked well enough for me to do extensive 3D testing, is badly programmed and never uses the FIFOIRQ in-game; rather, it sits in a busy loop checking to see if GXSTAT says the FIFO is empty, which I emulated correctly.
  • I had not implemented FLASH saving, which several 3D games use. Therefore, I could not test that many games in the first place.

On that note, every 3D game in my library can at least get to the title screen now:


Many issues of course (Not visible: MKDS hangs in-game and Pokemon Diamond doesn’t display any sprites in-game like the character and TVs), but they aren’t relevant for this article.

Now, since I implemented FLASH saving, there’s a huge problem that has to be addressed before the release.

DS games can have radically different save sizes, ranging from 512 bytes to 8 megabytes. Furthermore, there are three different protocols games use depending on the size: 512-byte games use “tiny” EEPROM, 8-64K games use regular EEPROM, and 256K+ games use FLASH. Each protocol has different commands, and while the protocol can be inferred from the save size, there is no 100% accurate way to figure out the save size.

It is possible to use heuristics to determine what could be the proper save size. This is because each save size has a maximum possible transfer length: for instance, you can only read up to 32 bytes at a time with 8K EEPROM. melonDS and medusa both use this heuristic for certain, and the other emulators likely make use of it as well. (Someone correct me if that’s not the case.)

However, there are ways this can be defeated:

  • A game could use a transfer length that’s lower than the maximum supported for this size. If an emulator forces the save size to be a certain length, like melonDS, the save file will be garbage.
  • A game could write out-of-bounds and check if the write works, presenting a “save error” screen if it does. This is often done intentionally to prevent piracy; when the DS was still relevant, pirates would stick as much save memory as possible into flashcarts in order to save (haha) on costs.

In either case, heuristics are not 100% reliable, and so I wish to avoid them in CorgiDS. Thus, I want to implement the following:

  • If a save file exists, get the save size from that and we’re done.
  • If there is no save file, ask the user to select a save size. If the initial size that is selected doesn’t work, then the user can modify it until everything works.

Making the user select the save size isn’t terribly convenient, however. Thus, at some point in the future, I want CorgiDS to be able to read from a database that stores all of this information and other stuff as well, somewhat akin to the GameINI files in Dolphin. However, I do not plan on doing this for v0.1; instead I will use the above solution directly until I have the time to extend it further.

For the time being, I’m going to figure out why my games are being derpy. 🙂

Let there be light

Two months ago, I made my first 3D-related post. Since then, GPU emulation for CorgiDS has made leaps and bounds. Alpha blending, lighting, clipping… things are looking a lot better.


…Unfortunately that’s about all I can show you for now. I rewrote the scheduling system in CorgiDS, and as a result, the other 3D games in my library get stuck in infinite loops. There’s likely just something I forgot to implement, so I’ll be looking to fix that.

Aside from making those games run, here is what remains before v0.1.

  • Fix other 3D glitches, such as the shadow under Yoshi in the images above.
  • Rewrite the instruction decoding in the CPUs. I want to move from a mess of if-statements to a lookup table, which should improve FPS by quite a bit. I could also try experimenting with a cached interpreter, but I don’t know if that will actually improve anything by a whole lot.
  • Rewrite memory accesses to VRAM to increase speed (and also support bank mirroring)
  • Rewrite the 2D rendering pipeline to support an intermediate 6-bit color stage. I’ll need this for finer control of the final output.

(Notice that a lot of it involves major rewrites… that’s what happens when you write an emulator while having no clue what’s going on at first.)

There’s some other things I want to address as well, such as bitmap sprites, but they are not as high-priority as the list above. I will take care of those before my self-imposed deadline if possible, however.

Let’s see if we can get this out by Christmas. 🙂

v0.1 by this year!

I’ll keep it short and simple: My goal is to release CorgiDS by the end of this year.

Now, setting deadlines is a dangerous thing, especially when there’s still a lot of work to be done. However, I think I’m in a good enough position to start wrapping the rest of the emulator’s issues up. Mainly, better 3D rendering, better speed, and other errata.

Along the way I’m going to perform some major renovations to the blog as well as get a new domain. Keep an eye out for both of those!

Fixing touchscreen issues

Emulators are no different from regular programs when it comes to debugging. The only added difficulty is that one must be able to debug the emulator itself by figuring out what’s going wrong in the system being emulated. This can be quite challenging indeed when one does not have access to the source code of a game and must look through a disassembly of the compiled ROM in order to figure out what’s going on. Nevertheless, debugging is very much possible with a great deal of perseverance.

A bug I fixed today as of writing this article seemed quite simple: certain commercial games were failing to read touchscreen output. The problem is that the emulated touchscreen does work; for example, the firmware and the Digimon games read it perfectly fine. For some reason though, games like Harvest Moon DS and Super Mario 64 DS weren’t even attempting to read from the touchscreen. In fact, code that was being called according to NO$GBA was being completely skipped over by CorgiDS! Over the course of several weeks, I halfheartedly attempted to fix this problem to no avail. It wasn’t a large priority until I tested Pokemon Diamond, which hung in an infinite loop just before reaching the title screen. Thinking that the touchscreen issue was related to this, I finally decided to put in some effort. (Spoiler: It wasn’t! Pokemon Diamond still remains to be fixed.)

I first suspected an issue with the IPCFIFO. The FIFO is a pair of two queues that both the ARM9 and ARM7 use to communicate with each other. For example, when the ARM7 reads touchscreen input, it can then send this information to the ARM9’s FIFO, which will trigger an IRQ (interrupt request) for it to handle. In the case of something like Harvest Moon DS, no FIFO communication was happening at all once the intro screens were pulled up, which is when the game is supposed to start reading touchscreen input. Inspecting the code, however, didn’t reveal any issues, and I was back to square one.

After a while, my next idea was to figure out where exactly the touchscreen code was in the ARM7 binary and work my way backwards to see which functions called it. Using the stack trace in NO$GBA and strategically-placed breakpoints, I backtracked all the way to the main loop in the ARM7 binary. I saw that the loop consists of calls to many different functions throughout the binary, so I placed breakpoints on each of them to determine which one would lead to the touchscreen code.

I found something odd: it looked like every single function would call the touchscreen code in NO$GBA! From a DS programming perspective, this doesn’t make much sense; user input only needs to be called once every frame. It didn’t take long for me to figure out what was happening: the touchscreen code was being called within an IRQ handler. This makes more sense because an IRQ can be called at any point during code execution, assuming they’re enabled of course.

The DS has a plethora of IRQ options available. Some of them are vital, such as V-Blank, which signals that the game can start accessing VRAM without interfering with the graphics engine. Others are almost never used by games, such as real-time clock IRQs. Regardless, every game will make use of several kinds of IRQs, and not processing them properly can lead to severe issues. I decided to place a breakpoint in the BIOS code responsible for jumping to whatever IRQ handler the game has so that I could check which IRQs were being called. I saw that one IRQ in particular, V-Counter Match, was indeed calling the touchscreen code. V-Counter Match is simple: if the current scanline that the graphics engine is drawing matches a variable called V-Counter, an IRQ is requested. On the ARM9 side, this can be used for special mid-frame graphical effects that require extra timing precision. I was surprised to see that the ARM7 touchscreen code relied on it, however.

I looked in my GPU code and saw that V-Counter was being set correctly to a value of 0 (it would be triggered at the very beginning of each frame then). Then I looked at the code responsible for calling the V-Counter Match IRQ, and I facepalmed hard. CorgiDS was incrementing V-Count (the current scanline position) before checking if the IRQ could be called. Because V-Count is never less than 0, this means that the IRQ would never be called if V-Counter was set to 0, which certain games would set it to. Making V-Count increment AFTER the IRQ check fixed all the touchscreen issues I was having.

What a ride that was. Hope you enjoyed reading about my struggles!

Work in progress

Terribly sorry about the lack of updates. There isn’t anything revolutionary to show, but there has been incremental progress.


While I’ve mostly fixed the issues with 3D title screens, there’s still not much in the way of graphical progress as far as going in-game is concerned. SM64DS is the only game in my library that actually displays stuff coherently; FFIV and others fail to render the models at all! Because the polygons themselves are clipped correctly, the problem seems to be with texture lookup. It’s super weird, as the code for texture rendering seems to be otherwise correct… Ah well, I’ll have plenty of time to fix these glitches.

I don’t have anything particularly interesting to write about this time. This is just a status update letting you guys know this project is still alive.

Getting somewhere!

With color interpolation out of the way, it wasn’t long before I was able to implement texture mapping. This is nearly enough to make the 3D games in my library playable:


While the inclusion of textures is obvious, I have also implemented z-buffering as well as z- and w-interpolation. The framework I have right now isn’t perfect, but it certainly gets the job (almost) done.

A lot of DS games like to use “3D-as-2D” graphics; e.g., the Tetris DS title screen, where the entire bottom screen is entirely 3D despite being a menu. This is mainly because it’s far easier to perform special effects using the GPU instead of the comparatively primitive 2D engines. Lighting effects, limitless rotation and scaling, stretching textures, per-pixel alpha blending, and more, all with a total of 2048 polygons per frame. In comparison, each 2D engine (one per screen) only has 128 sprites total, and only 32 of those can be rotscaled with less control than the GPU offers.

If you’ve read my previous article on color interpolation, textures largely act in the same manner. However, there is some extra complexity associated with them as you might guess. Each vertex in a polygon is capable of storing two texture coordinates s and t, which correspond to x and y respectively. These must be interpolated in the same manner as vertex colors, and combining s and t gives one “texel”. The meaning of the texel varies depending on the texture format; the GPU allows you to choose between direct color, palette-based with or without alpha coefficient, and compressed. Once the pixel color has been retrieved, it is combined with the color interpolation value and displayed on screen. To add to the fun, textures can be repeated on a polygon and flipped, and the vertex texture coordinates can be transformed by a special matrix if a game wishes to do so.

At this point, somewhere between 60-70% of the GPU is fully emulated. There are, of course, plenty more things to address before 3D games can be enjoyed:

  • Polygon strips. These are used to connect polygons together in order to save space on vertex lists as well as CPU time. Every detailed 3D model will use these, such as the Mario head in SM64DS.
  • Proper clipping of vertices outside the view volume, or in layman’s terms, vertices not shown on the camera. When a vertex leaves one of the six clipping planes (left, right, top, bottom, near, and far), the GPU will replace it with two extra vertices that intersect the clipping plane and polygon edge. This allows for polygons to go off- and onscreen seamlessly.
  • Interpolation precision. FFIV’s title screen looks ugly because of too much precision being lost. Some of you may be able to notice other quirks in the images shown.
  • Some other special effects, such as lighting and translucent polygons.

I’m hoping to get polygon strips and vertex clipping working within this week. Extra stuff like lighting will have to wait; some of the issues you see above are actually due to a combination of my imperfect 2D engine and even worse system timing, and those things definitely need fixing as well. Nevertheless, lots of progress over the last several days!


Disclaimer: this article is meant to describe the precision that the DS uses for interpolation as well as how interpolation works. It is not meant to deride DeSmuME (which uses lower precision on default settings), nor proclaim that CorgiDS is better (it isn’t). This is wholly a technical article, not a contest. I apologize if the tone of the article indicates otherwise. 

CorgiDS now supports color interpolation! Observe these shiny screenshots:


Astute observers may notice that the output here differs from something like DeSmuME. This is because, along with melonDS, CorgiDS uses the same color precision that the DS uses! StapleButter, the developer of melonDS, has helped immensely in getting interpolation working, as well as giving me information about the quirks of the DS GPU.

For comparison, here’s the output from DeSmuME using default settings on my Mac:


How does interpolation work? If you’re unfamiliar with the term, interpolation simply means finding any number of values within the boundaries of two known values. Consider the following: say you have a function f(x). You are given the values for, say, f(0) and f(10). Interpolation would be finding the values between f(0) and f(10), such as f(1), f(2), and so on.

The DS uses interpolation for both vertex colors (shown in the screenshots) and textures, the latter of which I’ve yet to implement. Both methods use the following formula:

((pa)u1w2 + a(u2w1)) / ((pa)w2 + aw1)

This formula outputs the interpolated attribute of pixel a. There’s a lot of stuff here, but the formula is simpler than it seems:

  • a is the pixel number of the line on which interpolation is to take place, ranging from 0 to p.
  • p is the total number of pixels on the line.
  • u1 and u2 are the attributes of the boundaries of the line. With color interpolation, these are the colors, and with texture interpolation, these are the texture coordinates.
  • w1 and w2 are the w-values of the boundaries. If you don’t recall, the DS uses four-dimensional matrices to clip polygons from a 3D representation to a 2D image. Vertices keep this fourth dimension, known as the w-axis. They are included in the formula to perform perspective correction.

Using this formula, color interpolation is deceptively simple. The left and right edges of a polygon are interpolated, and then the DS uses the result of the interpolation to interpolate the interior of the polygon. That’s all it takes!

…Well, no. In reality, as it tends to be with the DS GPU, things are more complicated. I mentioned earlier how CorgiDS and melonDS have more color precision than the other emulators. Vertex colors are 15-bit, meaning that each RGB value ranges from 0 to 0x1F (31), a measly amount considering that modern displays use at least 24-bit color. StapleButter discovered that the DS gets around this limitation by extending color precision to 27 bits during interpolation as to allow for a wider range of values and then reducing it to 18 bits for the display.

Furthermore, the above formula isn’t entirely correct. While it is what a modern GPU would use, the DS GPU is a lazy bastard and takes shortcuts. The GPU sets u1 and u2 to 0 and 1 respectively, giving the actual formula:

(pa)w2 / ((pa)w2 + aw1)

This formula gives a “perspective correction factor” that the DS uses to linearly interpolate colors and textures, which as you might guess, loses precision unnecessarily.

Another weird quirk: w-values are normalized to 16-bit precision using shift increments of 4. If a w-value is 12-bits long, for instance, the DS will extend this to 16-bits. However, if the w-value is 20-bits long, it is reduced down to 16-bits, greatly reducing precision. Why it couldn’t have done things like normal GPUs is beyond us emudevs… Nevertheless, documenting these quirks (and the many others) is necessary for pixel-perfect accuracy. Admittedly, I’m not aiming for 100% accuracy, but I’d still like to have some standards for accuracy myself.

If you haven’t guessed by now, next up is textures! Color interpolation is nice and all, but commercial games don’t run on vertex colors. With my newfound knowledge of interpolation, textures shouldn’t be a hard task at all.

Many thanks to StapleButter, who helped me understand many of the technical aspects of 3D graphics.