Legend of FIFOIRQ and saving part 2

I figured out why none of the other 3D games were working, and it was a stupid bug.

As mentioned before, the GPU has a FIFO that holds commands waiting to be executed. The GPU can send an interrupt request to the ARM9 depending on whether the FIFO is empty or half-empty. However… a peculiar feature of this FIFOIRQ is that the bit associated with its request will always remain set as long as the condition is true, even if the ARM9 tries to clear that register. Because I wasn’t emulating this, games would get stuck in infinite loops waiting for a FIFOIRQ that would never come. I didn’t catch this problem before for the following two reasons:

  • Super Mario 64 DS, the only game that worked well enough for me to do extensive 3D testing, is badly programmed and never uses the FIFOIRQ in-game; rather, it sits in a busy loop checking to see if GXSTAT says the FIFO is empty, which I emulated correctly.
  • I had not implemented FLASH saving, which several 3D games use. Therefore, I could not test that many games in the first place.

On that note, every 3D game in my library can at least get to the title screen now:


Many issues of course (Not visible: MKDS hangs in-game and Pokemon Diamond doesn’t display any sprites in-game like the character and TVs), but they aren’t relevant for this article.

Now, since I implemented FLASH saving, there’s a huge problem that has to be addressed before the release.

DS games can have radically different save sizes, ranging from 512 bytes to 8 megabytes. Furthermore, there are three different protocols games use depending on the size: 512-byte games use “tiny” EEPROM, 8-64K games use regular EEPROM, and 256K+ games use FLASH. Each protocol has different commands, and while the protocol can be inferred from the save size, there is no 100% accurate way to figure out the save size.

It is possible to use heuristics to determine what could be the proper save size. This is because each save size has a maximum possible transfer length: for instance, you can only read up to 32 bytes at a time with 8K EEPROM. melonDS and medusa both use this heuristic for certain, and the other emulators likely make use of it as well. (Someone correct me if that’s not the case.)

However, there are ways this can be defeated:

  • A game could use a transfer length that’s lower than the maximum supported for this size. If an emulator forces the save size to be a certain length, like melonDS, the save file will be garbage.
  • A game could write out-of-bounds and check if the write works, presenting a “save error” screen if it does. This is often done intentionally to prevent piracy; when the DS was still relevant, pirates would stick as much save memory as possible into flashcarts in order to save (haha) on costs.

In either case, heuristics are not 100% reliable, and so I wish to avoid them in CorgiDS. Thus, I want to implement the following:

  • If a save file exists, get the save size from that and we’re done.
  • If there is no save file, ask the user to select a save size. If the initial size that is selected doesn’t work, then the user can modify it until everything works.

Making the user select the save size isn’t terribly convenient, however. Thus, at some point in the future, I want CorgiDS to be able to read from a database that stores all of this information and other stuff as well, somewhat akin to the GameINI files in Dolphin. However, I do not plan on doing this for v0.1; instead I will use the above solution directly until I have the time to extend it further.

For the time being, I’m going to figure out why my games are being derpy. 🙂


Let there be light

Two months ago, I made my first 3D-related post. Since then, GPU emulation for CorgiDS has made leaps and bounds. Alpha blending, lighting, clipping… things are looking a lot better.


…Unfortunately that’s about all I can show you for now. I rewrote the scheduling system in CorgiDS, and as a result, the other 3D games in my library get stuck in infinite loops. There’s likely just something I forgot to implement, so I’ll be looking to fix that.

Aside from making those games run, here is what remains before v0.1.

  • Fix other 3D glitches, such as the shadow under Yoshi in the images above.
  • Rewrite the instruction decoding in the CPUs. I want to move from a mess of if-statements to a lookup table, which should improve FPS by quite a bit. I could also try experimenting with a cached interpreter, but I don’t know if that will actually improve anything by a whole lot.
  • Rewrite memory accesses to VRAM to increase speed (and also support bank mirroring)
  • Rewrite the 2D rendering pipeline to support an intermediate 6-bit color stage. I’ll need this for finer control of the final output.

(Notice that a lot of it involves major rewrites… that’s what happens when you write an emulator while having no clue what’s going on at first.)

There’s some other things I want to address as well, such as bitmap sprites, but they are not as high-priority as the list above. I will take care of those before my self-imposed deadline if possible, however.

Let’s see if we can get this out by Christmas. 🙂

v0.1 by this year!

I’ll keep it short and simple: My goal is to release CorgiDS by the end of this year.

Now, setting deadlines is a dangerous thing, especially when there’s still a lot of work to be done. However, I think I’m in a good enough position to start wrapping the rest of the emulator’s issues up. Mainly, better 3D rendering, better speed, and other errata.

Along the way I’m going to perform some major renovations to the blog as well as get a new domain. Keep an eye out for both of those!

Fixing touchscreen issues

Emulators are no different from regular programs when it comes to debugging. The only added difficulty is that one must be able to debug the emulator itself by figuring out what’s going wrong in the system being emulated. This can be quite challenging indeed when one does not have access to the source code of a game and must look through a disassembly of the compiled ROM in order to figure out what’s going on. Nevertheless, debugging is very much possible with a great deal of perseverance.

A bug I fixed today as of writing this article seemed quite simple: certain commercial games were failing to read touchscreen output. The problem is that the emulated touchscreen does work; for example, the firmware and the Digimon games read it perfectly fine. For some reason though, games like Harvest Moon DS and Super Mario 64 DS weren’t even attempting to read from the touchscreen. In fact, code that was being called according to NO$GBA was being completely skipped over by CorgiDS! Over the course of several weeks, I halfheartedly attempted to fix this problem to no avail. It wasn’t a large priority until I tested Pokemon Diamond, which hung in an infinite loop just before reaching the title screen. Thinking that the touchscreen issue was related to this, I finally decided to put in some effort. (Spoiler: It wasn’t! Pokemon Diamond still remains to be fixed.)

I first suspected an issue with the IPCFIFO. The FIFO is a pair of two queues that both the ARM9 and ARM7 use to communicate with each other. For example, when the ARM7 reads touchscreen input, it can then send this information to the ARM9’s FIFO, which will trigger an IRQ (interrupt request) for it to handle. In the case of something like Harvest Moon DS, no FIFO communication was happening at all once the intro screens were pulled up, which is when the game is supposed to start reading touchscreen input. Inspecting the code, however, didn’t reveal any issues, and I was back to square one.

After a while, my next idea was to figure out where exactly the touchscreen code was in the ARM7 binary and work my way backwards to see which functions called it. Using the stack trace in NO$GBA and strategically-placed breakpoints, I backtracked all the way to the main loop in the ARM7 binary. I saw that the loop consists of calls to many different functions throughout the binary, so I placed breakpoints on each of them to determine which one would lead to the touchscreen code.

I found something odd: it looked like every single function would call the touchscreen code in NO$GBA! From a DS programming perspective, this doesn’t make much sense; user input only needs to be called once every frame. It didn’t take long for me to figure out what was happening: the touchscreen code was being called within an IRQ handler. This makes more sense because an IRQ can be called at any point during code execution, assuming they’re enabled of course.

The DS has a plethora of IRQ options available. Some of them are vital, such as V-Blank, which signals that the game can start accessing VRAM without interfering with the graphics engine. Others are almost never used by games, such as real-time clock IRQs. Regardless, every game will make use of several kinds of IRQs, and not processing them properly can lead to severe issues. I decided to place a breakpoint in the BIOS code responsible for jumping to whatever IRQ handler the game has so that I could check which IRQs were being called. I saw that one IRQ in particular, V-Counter Match, was indeed calling the touchscreen code. V-Counter Match is simple: if the current scanline that the graphics engine is drawing matches a variable called V-Counter, an IRQ is requested. On the ARM9 side, this can be used for special mid-frame graphical effects that require extra timing precision. I was surprised to see that the ARM7 touchscreen code relied on it, however.

I looked in my GPU code and saw that V-Counter was being set correctly to a value of 0 (it would be triggered at the very beginning of each frame then). Then I looked at the code responsible for calling the V-Counter Match IRQ, and I facepalmed hard. CorgiDS was incrementing V-Count (the current scanline position) before checking if the IRQ could be called. Because V-Count is never less than 0, this means that the IRQ would never be called if V-Counter was set to 0, which certain games would set it to. Making V-Count increment AFTER the IRQ check fixed all the touchscreen issues I was having.

What a ride that was. Hope you enjoyed reading about my struggles!

Work in progress

Terribly sorry about the lack of updates. There isn’t anything revolutionary to show, but there has been incremental progress.


While I’ve mostly fixed the issues with 3D title screens, there’s still not much in the way of graphical progress as far as going in-game is concerned. SM64DS is the only game in my library that actually displays stuff coherently; FFIV and others fail to render the models at all! Because the polygons themselves are clipped correctly, the problem seems to be with texture lookup. It’s super weird, as the code for texture rendering seems to be otherwise correct… Ah well, I’ll have plenty of time to fix these glitches.

I don’t have anything particularly interesting to write about this time. This is just a status update letting you guys know this project is still alive.

Getting somewhere!

With color interpolation out of the way, it wasn’t long before I was able to implement texture mapping. This is nearly enough to make the 3D games in my library playable:


While the inclusion of textures is obvious, I have also implemented z-buffering as well as z- and w-interpolation. The framework I have right now isn’t perfect, but it certainly gets the job (almost) done.

A lot of DS games like to use “3D-as-2D” graphics; e.g., the Tetris DS title screen, where the entire bottom screen is entirely 3D despite being a menu. This is mainly because it’s far easier to perform special effects using the GPU instead of the comparatively primitive 2D engines. Lighting effects, limitless rotation and scaling, stretching textures, per-pixel alpha blending, and more, all with a total of 2048 polygons per frame. In comparison, each 2D engine (one per screen) only has 128 sprites total, and only 32 of those can be rotscaled with less control than the GPU offers.

If you’ve read my previous article on color interpolation, textures largely act in the same manner. However, there is some extra complexity associated with them as you might guess. Each vertex in a polygon is capable of storing two texture coordinates s and t, which correspond to x and y respectively. These must be interpolated in the same manner as vertex colors, and combining s and t gives one “texel”. The meaning of the texel varies depending on the texture format; the GPU allows you to choose between direct color, palette-based with or without alpha coefficient, and compressed. Once the pixel color has been retrieved, it is combined with the color interpolation value and displayed on screen. To add to the fun, textures can be repeated on a polygon and flipped, and the vertex texture coordinates can be transformed by a special matrix if a game wishes to do so.

At this point, somewhere between 60-70% of the GPU is fully emulated. There are, of course, plenty more things to address before 3D games can be enjoyed:

  • Polygon strips. These are used to connect polygons together in order to save space on vertex lists as well as CPU time. Every detailed 3D model will use these, such as the Mario head in SM64DS.
  • Proper clipping of vertices outside the view volume, or in layman’s terms, vertices not shown on the camera. When a vertex leaves one of the six clipping planes (left, right, top, bottom, near, and far), the GPU will replace it with two extra vertices that intersect the clipping plane and polygon edge. This allows for polygons to go off- and onscreen seamlessly.
  • Interpolation precision. FFIV’s title screen looks ugly because of too much precision being lost. Some of you may be able to notice other quirks in the images shown.
  • Some other special effects, such as lighting and translucent polygons.

I’m hoping to get polygon strips and vertex clipping working within this week. Extra stuff like lighting will have to wait; some of the issues you see above are actually due to a combination of my imperfect 2D engine and even worse system timing, and those things definitely need fixing as well. Nevertheless, lots of progress over the last several days!


Disclaimer: this article is meant to describe the precision that the DS uses for interpolation as well as how interpolation works. It is not meant to deride DeSmuME (which uses lower precision on default settings), nor proclaim that CorgiDS is better (it isn’t). This is wholly a technical article, not a contest. I apologize if the tone of the article indicates otherwise. 

CorgiDS now supports color interpolation! Observe these shiny screenshots:


Astute observers may notice that the output here differs from something like DeSmuME. This is because, along with melonDS, CorgiDS uses the same color precision that the DS uses! StapleButter, the developer of melonDS, has helped immensely in getting interpolation working, as well as giving me information about the quirks of the DS GPU.

For comparison, here’s the output from DeSmuME using default settings on my Mac:


How does interpolation work? If you’re unfamiliar with the term, interpolation simply means finding any number of values within the boundaries of two known values. Consider the following: say you have a function f(x). You are given the values for, say, f(0) and f(10). Interpolation would be finding the values between f(0) and f(10), such as f(1), f(2), and so on.

The DS uses interpolation for both vertex colors (shown in the screenshots) and textures, the latter of which I’ve yet to implement. Both methods use the following formula:

((pa)u1w2 + a(u2w1)) / ((pa)w2 + aw1)

This formula outputs the interpolated attribute of pixel a. There’s a lot of stuff here, but the formula is simpler than it seems:

  • a is the pixel number of the line on which interpolation is to take place, ranging from 0 to p.
  • p is the total number of pixels on the line.
  • u1 and u2 are the attributes of the boundaries of the line. With color interpolation, these are the colors, and with texture interpolation, these are the texture coordinates.
  • w1 and w2 are the w-values of the boundaries. If you don’t recall, the DS uses four-dimensional matrices to clip polygons from a 3D representation to a 2D image. Vertices keep this fourth dimension, known as the w-axis. They are included in the formula to perform perspective correction.

Using this formula, color interpolation is deceptively simple. The left and right edges of a polygon are interpolated, and then the DS uses the result of the interpolation to interpolate the interior of the polygon. That’s all it takes!

…Well, no. In reality, as it tends to be with the DS GPU, things are more complicated. I mentioned earlier how CorgiDS and melonDS have more color precision than the other emulators. Vertex colors are 15-bit, meaning that each RGB value ranges from 0 to 0x1F (31), a measly amount considering that modern displays use at least 24-bit color. StapleButter discovered that the DS gets around this limitation by extending color precision to 27 bits during interpolation as to allow for a wider range of values and then reducing it to 18 bits for the display.

Furthermore, the above formula isn’t entirely correct. While it is what a modern GPU would use, the DS GPU is a lazy bastard and takes shortcuts. The GPU sets u1 and u2 to 0 and 1 respectively, giving the actual formula:

(pa)w2 / ((pa)w2 + aw1)

This formula gives a “perspective correction factor” that the DS uses to linearly interpolate colors and textures, which as you might guess, loses precision unnecessarily.

Another weird quirk: w-values are normalized to 16-bit precision using shift increments of 4. If a w-value is 12-bits long, for instance, the DS will extend this to 16-bits. However, if the w-value is 20-bits long, it is reduced down to 16-bits, greatly reducing precision. Why it couldn’t have done things like normal GPUs is beyond us emudevs… Nevertheless, documenting these quirks (and the many others) is necessary for pixel-perfect accuracy. Admittedly, I’m not aiming for 100% accuracy, but I’d still like to have some standards for accuracy myself.

If you haven’t guessed by now, next up is textures! Color interpolation is nice and all, but commercial games don’t run on vertex colors. With my newfound knowledge of interpolation, textures shouldn’t be a hard task at all.

Many thanks to StapleButter, who helped me understand many of the technical aspects of 3D graphics.


As the title indicates, I have added basic saving support to CorgiDS. The games that were stuck on save error screens are stuck no more, and now boot to the title screen! Examples include Super Princess Peach, Tetris DS, Final Fantasy IV DS, and more. It’s worth noting that the save support is *really* basic: it just allocates a 64K block of EEPROM and prays that the game doesn’t use flash memory and doesn’t care about the size. Aside from that, it works perfectly, and I haven’t encountered any issues with my library so far.

Not much else to say… I’ve been hunting the bug in Harvest Moon DS that prevents the game from reading from the touchscreen (making it impossible to get past the title screen). Since I haven’t had success with this, I think I’m going to stop worrying about it for now. Getting save support to work has also uncovered some graphical glitches in the games that have started booting that I’ll need to fix eventually. In particular, a lot of games use 3D textures for 2D images such as backgrounds or title logos. It’s something I’ll need to take step by step, because things are becoming too complex to spread my efforts all over the place. Regardless, I’m looking forward to the challenge!

A three-dimensional perspective

After a week of hard work, CorgiDS now renders wireframe polygons, both triangles and quads:


A whole lot is going on under the hood here. Mainly, I replaced my hackish “run commands instantly” design with a proper GXFIFO implementation. This allows for far more sophisticated programs that can draw a whole lot more than a single primitive. If you’re curious, the GXFIFO is a 256-long command queue whose sole purpose is to provide a buffer for when the program overwhelms the tiny pipeline the GPU has. The GXFIFO has a lot of interesting properties: for instance, being able to request an IRQ if half-empty or empty and allowing for automatic DMA transfers when the queue becomes half-empty. While programs can directly feed commands and parameters into the GXFIFO using I/O transfers, the aforementioned DMA transfers are the most commonly used method for filling it. Because DMA transfers are, on average, faster than the time the GPU takes to execute its commands, care must be taken to make sure that the GXFIFO doesn’t overflow. Otherwise, the CPUs and DMA are frozen for up to several seconds until the GPU has enough space for new commands! Of course… certain games don’t take that precaution into mind and blindly fill up the FIFO as they please. CorgiDS currently just executes commands instantly if the GXFIFO overflows, as emulating that “feature” is too costly for the time being.

Obviously, the next step for CorgiDS is to be able to fill in polygons. After that, however, an equally important step is being able to draw polygon strips. Currently CorgiDS can only handle separate polygons, which is great for test homebrew, but games will use strips in order to save time and space on vertex commands. Color interpolation also needs to be added, as well as fixing a lot of the quirks in the renderer. This needs to be handled one step at a time, so it will still be a while before I’m able to run Pokemon and the like. All the fancy stuff in 3D rendering like textures and lighting will come later, I promise. (The technical article I’m also writing has been delayed due to a combination of 3D work and real life work, but that’s also coming eventually.)

To wrap this up, let’s see how CorgiDS does with rendering the Utah Teapot:


Better luck next time…

Road to 3D

After getting rotscale sprites to finally work (which was a pain in the ass), I grew bored of working on the 2D engine and save support. Thus, I decided to venture into the world of 3D!

No fancy pictures yet, unfortunately. While the geometry engine, which handles all of the matrix math and 3D representations, is mostly good enough to start drawing shapes, there’s still a bug somewhere that ultimately causes vertices to not have the right dimensions. In the interest of getting everything correct before rushing out into the great unknown, I have yet to begin work on the rendering engine.

The GPU is actually simpler than I expected it to be. The requirements for developers to get stuff on the screen are as follows. First, one must set up the projection and position matrices, the latter representing the camera. (It’s worth noting that on the GPU, matrices are 4×4, storing the three spatial dimensions as well as an extra W-dimension. Having the matrices be four-dimensional is useful for translation, as the relevant matrix must only be multiplied by a translation matrix; 3×3 matrices would also require an addition operation.) After optionally configuring some other properties, one must then start sending vertex lists. The DS provides four options: separate triangles/quads and triangular/quadrilateral strips. An arbitrary amount of vertices can be defined in a list under the condition that they don’t overflow vertex/polygon RAM and don’t incompletely define a polygon, but all polygons defined by said list will share the same properties: alpha-blending, fog, texture, etc. The tiniest change in a polygon’s properties will require a new vertex list to be sent. Finally, once all vertex lists have been sent, one must swap the geometry and rendering engine buffers. This will allow the GPU to start drawing all defined polygons as well as clearing the geometry engine’s buffer to be refilled as needed. The final image is seen as background 0 by the 2D display engine and can mostly be treated as such.

The progress I have made, despite my lack of visible results, is quite encouraging for me. CorgiDS is getting close to reaching a significant milestone! Still no guarantees can be made for the release date, but there’s not a whole lot of actual features left to implement afterwards. It’s been a rough ride getting this far, but compared to when I first started this blog, I feel as though I’m able to see the light at the end of the tunnel, as far away as it may be. 🙂