What’s left for v0.2, and the plan for v0.3

v0.2 has been an accuracy-focused update. So far, I’ve added audio, fixed tons of bugs, and overall improved the graphics.

Here’s an example of what I’m talking about:

present1mariokart_improved

On the left is the v0.1 version, and on the right the v0.2 version. The differences may not seem like a whole lot, but getting to this point has required a lot of work on the 2D engine. First, sprites weren’t showing up at all because of a bug with VRAM accesses. Next, I had to implement window functionality. There are two fixed-sized windows as well as an “object window” that uses sprites, and backgrounds and sprites can be enabled or disabled as you please within windows. Mario Kart disables sprites in the regions outside of the windows, and the two fixed-sized ones are used to hold the “1st” and lap number on the top screen. The top-left corner holds an object window, where sprites are visible only within that black square. This allows the game to perform a slot machine effect without the items showing up outside of the window. I also fixed a nasty display capture bug that caused games that store code in VRAM to freeze.

The 3D engine has also received some love. Fog has been implemented, and I’m in the process of adding dynamic shadows and edge-marking. The Spiky Polygon Syndrome inflicting many games such as Final Fantasy IV, New Super Mario Bros, and more, has mostly been fixed, aside from a few edge cases. The problem was not vertex-sharing as I initially believed; it was actually a bug with clip matrix reads. The issue lingers in games like Sims Castaway, but I need to do more debugging for that.

Finally, I fixed a bug with DMA transfers that caused many games to not boot. The ARM7 DMAs have a maximum 16-bit length, and a write of zero is interpreted as max length. The ARM9 DMAs follow the same latter rule but have a maximum 21-bit length. I wasn’t accounting for the upper 5 bits on the ARM9 but I was using the ARM9 max-length, so games would accidentally overwrite critical memory.

After I get done with shadows and edge-marking on the GPU, my remaining plan is just to improve compatibility with broken or glitchy games. Some ideas I have are fixing save problems with the Pokemon games as well as defeating the infamous “no EXP” anti-piracy. I thought about implementing cache emulation, but I feel that it’s not worth it at this stage as the emulator is still immature. I want CorgiDS v0.2 to be out by the end of this month.

That brings us to my plans for v0.3.

Because v0.2 primarily focused on accuracy and overall compatibility, I want v0.3 to focus on optimization and quality-of-life.

My first goal is to completely high-level emulate (HLE) the NDS BIOS and firmware. This should offer some minor speedups on games that rely on the BIOS, but more importantly, it removes the need to have dumped the BIOS and firmware in order to play games. HLEing the BIOS means implementing all software interrupts that games use, and HLEing the firmware only means storing pre-determined values into memory (like DeSmuME does). The option to provide your own images will still be available for improved accuracy and the ability to boot from the firmware.

My second goal, far loftier, is adding a dynarec.

A dynarec (short for “dynamic recompiler”) converts assembly from one CPU architecture to another. In this case, the CorgiDS dynarec would recompile ARM machine code into x86 machine code. The benefits are twofold: one, the overhead incurred from the interpreter having to re-translate every opcode would be eliminated. Two, a dynarec offers opportunities for re-optimization that wouldn’t be possible with an interpreter model. An intelligent dynarec can take advantage of the architecture of the target machine and produce code tailored towards it, allowing for major gains in speed. Even in the most 3D-intensive games, the CPU code is still a large bottleneck. The dynarec, if designed correctly, would alleviate this, allowing CorgiDS to run on less powerful computers.

So what’s the catch? A dynarec requires a thorough understanding of the assembly language of the target processor. A naive implementation of a dynarec is already more complex than an interpreter, and an optimizing dynarec is far more complex. While slow, the interpreter can at least be debugged more easily. I’ve also never written a dynarec before, so only time will tell how long it takes for v0.3 to come out. Nevertheless, I’m not one for backing down from a difficult challenge.

My final goal, if time permits, is to add GBA functionality to CorgiDS. Because the NDS uses most of the GBA hardware, this would be done by re-using code already in place. I would have to create new scheduler, sound, and graphics code, but everything else won’t be so bad. By re-using the NDS hardware, this offers the possibility of booting from the firmware and loading a GBA game like you would on a real DS. Of course, if the stuff above takes a while, this may have to wait for a future update.

As always, thanks for your support!

Advertisements

Teaching CorgiDS how to bark

Sound can make or break a video game’s success. If you don’t have a memorable melody in the soundtrack, you lose a hook to convince people to buy a sequel. If you don’t have good sound design, you’ll find it more difficult to immerse yourself in the world that’s been created.

If you don’t have sound at all… you’re likely dealing with the initial release of an emulator.

After v0.1, I fixed some small bugs here and there, but I soon decided to attempt (what seemed like) a more complex task: finally adding sound to the emulator. And it works, somewhat. If you want to test it yourself, you can clone the v0.2 branch on GitHub and compile it using qmake.

But how does the DS sound processing unit (SPU) work?

The SPU has sixteen channels and two sound capture units. The channels can all play sound in PCM8, PCM16, and IMA-ADPCM; the first two are raw sound data, and the last one is compressed PCM16, similar to what you’d find in a .wav file. Channels 8-13 can also play PSG (square waves, like on the Game Boy), and channels 14-15 can play white noise. The SPU runs on its own clock at a frequency of ~16 MHz, which is just half the ARM7’s clock rate. To load music into a channel, you input an address for memory and a frequency… and you’re done. The channels are more configurable than that, of course, but it really is that simple. The capture units just take samples from channels 0-3 and put them in RAM, where the CPU can perform fancy sound effects before re-outputting them.

Implementing all of the above wasn’t that bad. I had some difficulties getting ADPCM to work, but all of the issues were resolved in a matter of hours. The hardest part was making things compatible for the Qt frontend. On Windows and Linux, Qt sound automatically shuts itself off when it detects no sound data. Because the SPU would not output sound upon starting a game, well… there wouldn’t be any sound. I resolved this by creating an intermediate buffer that stores old sound data, which is outputted if the SPU doesn’t have anything. It doesn’t sound great, but it gets the job done for now.

If you’re familiar with the GBA, you’ll see the DS SPU is a vast improvement over it. If not, let’s do a comparison. The GBA has six channels, four of which are just copies of the Game Boy’s sound registers. The other two take PCM8 data, but not automatically. If you want to use both channels, two of the four DMA (Direct Memory Access) units must be reserved for them. Furthermore, you must also reserve two of the four timer units, as the channels do not have their own clock. The GBA CPU is paused during the DMA transfers, so that means you have to make a tradeoff between CPU time and sound quality. The DS suffers from none of these flaws; all you need to do is supply the data, and the SPU works on its own.

The SPU implementation in CorgiDS is incomplete, for the moment. Neither the capture units nor the sound FIFOs are implemented. Sound is synchronous, meaning that if CorgiDS isn’t running full speed, it sounds terrible. Even when running full speed, things sound a bit off. Even so, the sound was good enough that I got a bit sidetracked and played Mario and Luigi: Partners in Time for a couple of hours. 🙂

CorgiDS v0.1 is officially out!

https://github.com/PSI-Rockin/CorgiDS/releases/tag/v0.1

32-bit binaries are available for Windows. I’m working on getting OS X binaries deployed as we speak, but for now, you’ll have to compile things yourself.

Click here to see how to set up CorgiDS. The link gives you information on the BIOS/firmware you need, controls, and save files.

Please report any major bugs on the GitHub issue tracker, such as games not booting or graphical glitches that make a game unplayable. The nature of a v0.1 release means there will be numerous minor bugs that don’t affect playability; please use discretion in reporting those, as too many can distract from the more serious issues. But do test as many games as possible!

Have fun with the corgi, and happy holidays! 🙂

CorgiDS v0.1, looking for help with Windows/Linux builds

The time has come! CorgiDS is now ready to venture into public scrutiny. This means I have released the source code as well (but no builds yet, read below for more details).

present1present2present3present4present5present6

Here’s some of what CorgiDS v0.1 has to offer:

  • A mostly-complete 2D engine
  • Software 3D rendering. Missing many features but works okay for many games
  • Ability to toggle frameskip and framelimiter on/off
  • Booting games directly or from the firmware
  • Reading from the AKAIO save database

An OS X build is ready to go. Unfortunately, I don’t have access to either a Windows or Linux machine. Therefore, I am seeking assistance with providing builds for both (or at the very least, Windows). If you have Qt installed and are willing to assist the project, please contact me through GitHub (linked below). I will help with any compilation errors, although it shouldn’t be very hard to get things working.

I will withhold the OS X build for now until I have something for Windows, so the actual release may not happen for another couple of days. In the meantime, here’s the project GitHub!

Feature chill

Not a whole lot remains until the release for CorgiDS. In fact, I plan to release it relatively soon for the curious.

I’ve stopped adding on to the emulator itself, and instead, I’ve been focusing on optimization. I’ve separated the GUI and emulation into separate threads, which has produced a massive speedup. Every 2D game now goes well above 60 FPS, and 3D games now reach a playable state. I don’t have a good way to actually measure FPS yet as the emulation thread has no frame limiter, but that’s on the list of tasks I’m performing. Some code cleanup has also helped with speed, but nothing drastic.

Unfortunately, the code cleanup has given birth to another stupid bug. Tetris DS is overflowing the position matrix stack on the GPU when it wasn’t doing that before. I can’t just ignore it because it produces a hideous flicker effect. I think the problem lies in the CPU, but I’m still hunting for it. That needs to get fixed as I don’t know if any other games are affected.

What else? I’ve added a framework for HLE BIOS emulation: in the future, this means you won’t have to dump the BIOS and firmware from a DS in order to play games on CorgiDS. While it has a few functions already implemented, I’m keeping it turned off for v0.1 so that it can’t be mis-used. I’m also trying to figure out a way to handle fatal errors in the course of emulation without having to crash the program… it’s difficult to figure out something that keeps the emulator core and frontend decoupled. I still need to add a window for configuring the save size of a game as well. Last but not least, I need to upload the source code to GitHub and find someone to help create builds outside of OS X.

It’s not quite a feature freeze, but things are cooling down for the holiday season. 🙂

 

Holiday plans

Happy December to all of you reading this!

At the moment, I’m taking a break from CorgiDS… by working on another emulator project called DaneBoy (named after the Great Dane). Don’t fear though; I’m not doing this merely for diversion, but rather, for practice.

The UI for CorgiDS is written using Qt. While Qt makes C++ UI development easy, it does so by making itself extremely bloated. One effect of this is that Qt does a lot of work on the main thread, which is not good for emulators and any other projects that require lots of processing time. Since I’d rather not switch UI libraries, the solution is, of course, moving emulation over to a separate thread. Assuming negligible overhead from thread synchronization once a frame, this would boost FPS by 20-100%, depending on the game.

The catch? I’m not privy to multithreading, so I’m afraid to touch the CorgiDS codebase and risk introducing nasty bugs due to my lack of knowledge. Instead, DaneBoy will bear the brunt of my learning experiences. I’ll incorporate what I’ve learned into CorgiDS once I believe I’m not going to deadlock everything.

Before I started DaneBoy, I also created a little program that automatically generates a jump table for ARMv5 instructions. Unlike the Z80 in the Game Boy, ARM CPUs aren’t easily decoded using a giant switch block. The two options are either using an ugly mess of if-statements or creating a jump table with 4096 separate elements. CorgiDS used the former for a while, but I finally got around to rewriting that behemoth of the codebase a couple of days ago. I don’t know of the existence of any similar decoding programs, but if there aren’t any, I’ll release mine along with CorgiDS. This is so that anyone else developing an emulator that uses an ARM CPU can benefit from this.

I don’t plan on doing anything noteworthy with DaneBoy, nor do I even want to release it. This is just a practice project; CorgiDS is what I’m really concerned about. Either way, it’s actually pretty fun to work on a much simpler system than the DS. 🙂

Ignorance, or Why You Should Research Whatever You’re Emulating Before You Start Coding

Apologies for the radio silence over the past couple of weeks. To summarize, I’ve run into a nasty problem that will require me to rewrite a significant portion of the 3D code in CorgiDS, and I’m trying to figure out how I want to do that.

When I started writing code for the DS GPU, I had no clue at all what I was doing. I had never worked with a 3D graphics library like OpenGL before, yet here I was attempting to create a software renderer in an emulator. After proverbially bashing my head against the keyboard over the course of several weeks, I finally started getting tangible results: getting polygons on the screen at all was cause for celebration. I gradually added more features such as textures, alpha blending, and lighting over time. Finally, after many, many weeks of work, CorgiDS almost meets my standards for a software renderer.

Almost.

Take a look at this image from Final Fantasy IV in CorgiDS:

final_fantasy_clipping

There are quite a few issues in this image, but let’s focus on one in particular: half of Cecil’s body on the left is entirely missing. If you look carefully, some polygons on the soldiers’ lower bodies are also missing.

What’s happening? I can’t conclusively say what the problem is, but I’ve managed to narrow it down to something going wrong with the clipping code. Clipping is how GPUs deal with polygons that extend beyond at least one of six planes of the viewing frustum: left, right, top, bottom, near, and far. The GPU “clips” the vertex outside of the frustum and creates two new vertices that lie directly on the plane.

The polygons are missing because one of their vertices is getting clipped on the far plane (the direction away from the camera), and the game is set to not render any polygons intersecting the far plane. I don’t exactly know why this is happening, but I have two thoughts:

  • A precision error causes vertices that should not be clipped to become clipped.
  • Vertices are being clipped multiple times.

I ruled out the first one, as there doesn’t seem to be anything wrong with my matrix multiplication code. Furthermore, the DS uses fixed-point arithmetic rather than floating-point, so a precision error is far less likely. That brings us to the second thought.

Because of my unfamiliarity with 3D graphics, I have used melonDS’s software renderer as a reference for creating my own. Out of a desire to learn things on my own and not outright copy someone else’s work, I have added parts bit-by-bit to my code. This organic process has led to CorgiDS’s software renderer being the messiest part of the codebase, founded upon faulty assumptions and unclear ideas. While it all remains solvable, there is one fundamental issue: CorgiDS does not re-use vertices in polygon strips.

The code for polygons so far looks like this:

Screen Shot 2017-11-26 at 1.13.16 PM

Note the “vert_index” variable, which points to the first vertex used. This code makes the assumption that vertices remain contiguous within RAM. While true for 90% of cases, this completely falls apart when polygon strips are involved. The DS GPU can allow two polygons to re-use the same vertices under special conditions, meaning that vertex lists no longer become contiguous. melonDS indicates that re-used vertices don’t get clipped again, but there’s other rules that I don’t quite understand…

Anyway, if this truly is the problem (and I don’t know what else could be), then a large portion of the renderer will need to be rewritten, a task that has little appeal to me. I might just ignore this problem entirely for the first release and focus my efforts elsewhere… not a whole lot of games are affected by this. Decisions, decisions…

Legend of FIFOIRQ and saving part 2

I figured out why none of the other 3D games were working, and it was a stupid bug.

As mentioned before, the GPU has a FIFO that holds commands waiting to be executed. The GPU can send an interrupt request to the ARM9 depending on whether the FIFO is empty or half-empty. However… a peculiar feature of this FIFOIRQ is that the bit associated with its request will always remain set as long as the condition is true, even if the ARM9 tries to clear that register. Because I wasn’t emulating this, games would get stuck in infinite loops waiting for a FIFOIRQ that would never come. I didn’t catch this problem before for the following two reasons:

  • Super Mario 64 DS, the only game that worked well enough for me to do extensive 3D testing, is badly programmed and never uses the FIFOIRQ in-game; rather, it sits in a busy loop checking to see if GXSTAT says the FIFO is empty, which I emulated correctly.
  • I had not implemented FLASH saving, which several 3D games use. Therefore, I could not test that many games in the first place.

On that note, every 3D game in my library can at least get to the title screen now:

mario_kartmario_kart_modelpokemon_diamondanimal_crossing

Many issues of course (Not visible: MKDS hangs in-game and Pokemon Diamond doesn’t display any sprites in-game like the character and TVs), but they aren’t relevant for this article.

Now, since I implemented FLASH saving, there’s a huge problem that has to be addressed before the release.

DS games can have radically different save sizes, ranging from 512 bytes to 8 megabytes. Furthermore, there are three different protocols games use depending on the size: 512-byte games use “tiny” EEPROM, 8-64K games use regular EEPROM, and 256K+ games use FLASH. Each protocol has different commands, and while the protocol can be inferred from the save size, there is no 100% accurate way to figure out the save size.

It is possible to use heuristics to determine what could be the proper save size. This is because each save size has a maximum possible transfer length: for instance, you can only read up to 32 bytes at a time with 8K EEPROM. melonDS and medusa both use this heuristic for certain, and the other emulators likely make use of it as well. (Someone correct me if that’s not the case.)

However, there are ways this can be defeated:

  • A game could use a transfer length that’s lower than the maximum supported for this size. If an emulator forces the save size to be a certain length, like melonDS, the save file will be garbage.
  • A game could write out-of-bounds and check if the write works, presenting a “save error” screen if it does. This is often done intentionally to prevent piracy; when the DS was still relevant, pirates would stick as much save memory as possible into flashcarts in order to save (haha) on costs.

In either case, heuristics are not 100% reliable, and so I wish to avoid them in CorgiDS. Thus, I want to implement the following:

  • If a save file exists, get the save size from that and we’re done.
  • If there is no save file, ask the user to select a save size. If the initial size that is selected doesn’t work, then the user can modify it until everything works.

Making the user select the save size isn’t terribly convenient, however. Thus, at some point in the future, I want CorgiDS to be able to read from a database that stores all of this information and other stuff as well, somewhat akin to the GameINI files in Dolphin. However, I do not plan on doing this for v0.1; instead I will use the above solution directly until I have the time to extend it further.

For the time being, I’m going to figure out why my games are being derpy. 🙂

Let there be light

Two months ago, I made my first 3D-related post. Since then, GPU emulation for CorgiDS has made leaps and bounds. Alpha blending, lighting, clipping… things are looking a lot better.

teapot_yeslighting_goodlighting_good2

…Unfortunately that’s about all I can show you for now. I rewrote the scheduling system in CorgiDS, and as a result, the other 3D games in my library get stuck in infinite loops. There’s likely just something I forgot to implement, so I’ll be looking to fix that.

Aside from making those games run, here is what remains before v0.1.

  • Fix other 3D glitches, such as the shadow under Yoshi in the images above.
  • Rewrite the instruction decoding in the CPUs. I want to move from a mess of if-statements to a lookup table, which should improve FPS by quite a bit. I could also try experimenting with a cached interpreter, but I don’t know if that will actually improve anything by a whole lot.
  • Rewrite memory accesses to VRAM to increase speed (and also support bank mirroring)
  • Rewrite the 2D rendering pipeline to support an intermediate 6-bit color stage. I’ll need this for finer control of the final output.

(Notice that a lot of it involves major rewrites… that’s what happens when you write an emulator while having no clue what’s going on at first.)

There’s some other things I want to address as well, such as bitmap sprites, but they are not as high-priority as the list above. I will take care of those before my self-imposed deadline if possible, however.

Let’s see if we can get this out by Christmas. 🙂