So it’d be great to say we’re done - but there are still a couple of things to iron out.

Test ROM

The test ROM by corax89 is really handy - when run it will validate instructions and assuming you have the disp instruction working (0xdxyn) it will output one of ok or no for each instruction type.

I took it for a spin and was greeted by 3 things. The first was that it completed before I could make any sense of what happened - so I added a SDL_Delay(200); after each iteration to slow things down (told you that timing thing would come back to haunt us…). The second was that every status code is displayed in the middle of the screen instead of in a grid format (woops). The third was a bunch of no statuses for my “set register to register” and subtraction operations:

display

A quick glance at the code showed that I had swapped registers x and y around:

diff --git a/src/mylib.c b/src/mylib.c
index 8901755..0740405 100644
--- a/src/mylib.c
+++ b/src/mylib.c
@@ -102,7 +102,7 @@ void jumpIfRegNotEqualToReg(State *state, uint8_t reg1, uint8_t reg2)
 }
 void setRegisterToRegister(State *state, uint8_t reg1, uint8_t reg2)
 {
-    state->registers[reg2] = state->registers[reg1];
+    state->registers[reg1] = state->registers[reg2];
     state->pc += 2;
 }
 void setRegisterToBitwiseOr(State *state, uint8_t reg1, uint8_t reg2)

For subtraction I messed up the mod 255 operation:

@@ -143,15 +143,14 @@ void addRegisters(State *state, uint8_t reg1, uint8_t reg2)
 void subtractRegisters(State *state, uint8_t reg1, uint8_t reg2)
 {
     bool needBorrow = state->registers[reg2] > state->registers[reg1];
-    state->registers[reg1] = (state->registers[reg1] - state->registers[reg2]) + (needBorrow ? 0xff : 0);
-    // if we need a borrow, this is set  0
+    state->registers[reg1] = (state->registers[reg1] - state->registers[reg2]) % 0xff;
     state->registers[0xf] = needBorrow ? 0 : 1;
     state->pc += 2;
 }
 void subtractRightFromLeft(State *state, uint8_t reg1, uint8_t reg2)
 {
     bool needBorrow = state->registers[reg1] > state->registers[reg2];
-    state->registers[reg1] = (state->registers[reg2] - state->registers[reg1]) + (needBorrow ? 0xff : 0);
+    state->registers[reg1] = (state->registers[reg2] - state->registers[reg1]) % 0xff;
     // if we need a borrow, this is set  0
     state->registers[0xf] = needBorrow ? 0 : 1;
     state->pc += 2;

Fixing those and running the test ROM again gave me the all clear. But I couldn’t readily tell what was wrong with the display.

Fixing the display

Controlling execution

One thing that I wanted was the ability to suspend execution - to be able to log some of the state (e.g. dump the registers, show the memory, …), and resume/step through instructions. So I changed the main loop to this such that pressing space would suspend/resume execution, and pressing return would step through the next instruction (before suspending again).

    bool running = true;
    bool step = false;
    while (!state.quit)
    {
        if (running || step)
        {
            processOp(&state, memory);
            if (state.draw)
            {
                updateScreen(renderer, texture, memory, pixels);
                state.draw = false;
            }
            processInput(&state, keyStates);

            step = false;
        }
        if (keyStates[SDL_SCANCODE_SPACE])
        {
            SDL_Log("Pause toggled");
            // so we don't untoggle too fast
            SDL_Delay(1000);
            running = running ? false : true;
        }
        if (keyStates[SDL_SCANCODE_RETURN])
        {
            SDL_Log("Stepping through");
            step = true;
        }
        // more for quit than anything else?
        while (SDL_PollEvent(&event))
        {
            processEvent(&state, &event);
        }
        SDL_Delay(100);
    }

We can see that working through the logs:

INFO: Pause toggled
INFO: Stepping through
INFO: Decoding a202 (A:a, B:2, C:0, D:2)
INFO: Stepping through
INFO: Decoding dab4 (A:d, B:a, C:b, D:4)
INFO: Stepping through
INFO: Decoding 6b10 (A:6, B:b, C:1, D:0)

Animating a sprite

The aim here is to write a small ROM (in assembly) that animates a sprite. This will hopefully show us what we might be doing wrong.

Picking the right sprite was the hardest bit. Thankfully there are free tools like pixilart that make this easy. I’m not much of an artist but this looked alien-like enough for me. Note the use of an asymmetric pattern to catch bit-flipping:

 #    #   01000010
  #  # #  00100101
 ######   01111110
## ## ##  11011011
########  11111111
 ######   01111110
  #  #    00100100
  ####    00111100

As I started to look through this I realised I made 2 mistakes. The first is that I took x,y to be absolute coordinates whereas the specs specify those are registers - that’s the easy fix. The second mistake was to assume you always wrote pixels at 8-bit boundaries. Consider the top row of the sprite:

Step 1 - (0,0): 0100000010 (0,8): 00000000
Step 2 - (0,0): 0010000001 (0,8): 00000000
Step 3 - (0,0): 0001000000 (0,8): 10000000

Shifting the bit pattern doesn’t happen in multiples of 8. If we update a pattern at (0,3) it will cause 2 memory addresses to change:

Current pixels: (0,0): 01100111 (0,8): 01100001
New pixels:     (0,0):    10100 (0,8): 110
End result:     (0,0): 01110100 (0,8): 11000001

Now because this implementation stores the state of the pixels in memory, we store them in groups of 8 bits - so updating pixels at places other than boundaries is a little trickier. I tried my hand at doing this but I couldn’t justify the complexity of the code (also, whilst it appeared to work, I couldn’t convince myself it was correct).

Pragmatism took over - instead of relying on writing the display to the “main” memory I added a bool array representing the state of each pixel to the state:

    bool pixels[SCREEN_WIDTH*SCREEN_HEIGHT];

Meaning that my setPixel implementation was condensed to about 15 lines instead of the monstruosity it was before:

void setPixels(State *state, uint8_t xReg, uint8_t yReg, uint8_t height, uint8_t memory[])
{
    state->registers[0xf] = 0;
    for (int h=0; h<height;h++) {
        uint8_t y = (state->registers[yReg] + h) % SCREEN_HEIGHT;
        uint8_t sprite = memory[state->i+h];
        for (int shift=7;shift>=0;shift--) {
            uint8_t x = (state->registers[xReg] + 7-shift) % SCREEN_WIDTH;
            uint8_t bit = (sprite >> shift) & 0x1;
            state->registers[0xf] |= state->pixels[x+y*SCREEN_WIDTH] & bit;
            state->pixels[x+y*SCREEN_WIDTH] ^= bit;
        }
    }
    state->draw = true;
    state->pc += 2;
}

Whilst doing this I also re-read the original Byte magazine CHIP-8 instructions to sort out how the 0xf register gets set (hint - it’s when you try to turn on a pixel that is already on, not when you change the state of a pixel, meaning it’s actually different from the draw flag!).

The following test ROM will slide the alien’s face down the screen every second (based on the timer):

        0xa2, 0x04, // set i to x204
        0x12, 0x0c, // jump to the start of the program
        0x42, 0x24, // alien sprite start
        0x7e, 0xdb,
        0xff, 0x7e,
        0x24, 0x3c, // alien sprite end
        0x61, 0x0,  // r1=0
        0x62, 0x0,  // r2=0
        0x0, 0xe0,  // DISP: clear display
        0xd1, 0x28, // display sprite
        0x71, 0x01, // r1 += 1
        0x72, 0x01, // r2 += 1
        0x65, 0x3c, // r5 = 60
        0xf5, 0x15, // set the timer to r5
        0xf5, 0x07, // CHECK_TIMER: r5 = delay timer value
        0x35, 0x00, // if r5 (the timer value) is 0, skip the next instructions
        0x12, 0x1c, // jump back to CHECK_TIMER
        0x12, 0x10, // jump back to DISP

Once satisfied it worked, re-running the test ROM gives the expected output:

display

Success!

Input capture

It turns out there’s a much simpler way of capture input. The below creates an array that stores the underlying state of each key:

const uint8_t *keyStates = SDL_GetKeyboardState(NULL);

The state of this is updated when various SDL functions are called but the one that does this with minimal side effects is SDL_PumpEvents - we just need to call it more or less every time we want to process input. Speaking of which, the latter is no longer a massive case statement based on an SDL_Event:

void processInput(State *state, const uint8_t keyStates[])
{
    state->input[0x1] = keyStates[SDL_SCANCODE_1];
    state->input[0x2] = keyStates[SDL_SCANCODE_2];
    state->input[0x3] = keyStates[SDL_SCANCODE_3];
    state->input[0xc] = keyStates[SDL_SCANCODE_4];
    state->input[0x4] = keyStates[SDL_SCANCODE_Q];
    state->input[0x5] = keyStates[SDL_SCANCODE_W];
    state->input[0x6] = keyStates[SDL_SCANCODE_E];
    state->input[0xd] = keyStates[SDL_SCANCODE_R];
    state->input[0x7] = keyStates[SDL_SCANCODE_A];
    state->input[0x8] = keyStates[SDL_SCANCODE_S];
    state->input[0x9] = keyStates[SDL_SCANCODE_D];
    state->input[0xe] = keyStates[SDL_SCANCODE_F];
    state->input[0xa] = keyStates[SDL_SCANCODE_Z];
    state->input[0x0] = keyStates[SDL_SCANCODE_X];
    state->input[0xb] = keyStates[SDL_SCANCODE_C];
    state->input[0xf] = keyStates[SDL_SCANCODE_V];
}

Tidy heh?

Timers & timings

I left this till the end because… I wasn’t sure how I wanted to deal with it. After some (light) reading, I think the approach below should do the trick.

To start, let’s note that CHIP-8 timers (both the delay and the sound) run at 60Hz - this roughly translates to a tick every 0.0167 seconds. In other words, the timers should be decremented every 0.0167 seconds.

Leaving that aside, how many opcodes should we process per time step? The answer is a bit more complicated in that it really is “as many as we can but no more than”. The CHIP-8 virtual CPU runs at around 500Hz, or 500 cycles per second. We can make the assumption that each opcode takes one cycle to process (it doesn’t really - some might take more where e.g. memory access is required).

Combining both together, the pseudo-code for the loop goes like this:

clockSpeed = 500
timerDelta = 1/60.0
accumulator = 0
currTime = now()
while (running)
{
  newNow = now()
  elapsedTime = newNow - currTime
  numCycles = int(elapsedTime*clockSpeed)
  if (numCycles > 1)
  {
    currTime = newNow
    timePerCycle = elapsedTime/numCycles
    while (numCycles > 1)
    {
      procesInput
      processOp
      updateDisplay
      accumulator += timePerCycle
      while (accumulator > timerDelta)
      {
        if (delayTimer > 0) delayTimer -= 1
        if (soundTimer > 0) soundTimer -= 1
        accumulator -= timerDelta
      }
      maybeDelayIfTooFast
    }
  }
}

This places a lower bound on the execution - if we process operations faster than 500Hz (500 cycles per second), we’ll just wait until enough time has elapsed that we need to process something. Spoiler alert, this is exactly what happened when I tried this on my moderately modern laptop. It would execute a handful of cycles super fast and then wait till it accumulated enough “ticks” to process some instructions. This gave games a very noticeable stop/start delay. The solution was to add a fixed delay after each cycle of timePerCycle. If anything I was wondering whether I was going too slow - but this clearly wasn’t the case.

The implementation is practically the same as the pseudo-code:

    while (!state.quit)
    {
        newTick = SDL_GetTicks();
        elapsedTicks = newTick - currTick;
        numCycles = elapsedTicks / 1000 * clockSpeed;
        totalCycles += numCycles;
        if (numCycles > 0)
        {
            currTick = newTick;
            timePerCycle = elapsedTicks / numCycles;
            while (numCycles > 1)
            {

                SDL_PumpEvents(); // this is needed to populate the keyboard state array
                processInput(&state, keyStates);
                processOp(&state, memory);
                accumulator += timePerCycle;
                if (keyStates[SDL_SCANCODE_SPACE])
                {
                    SDL_Log("Space pressed, will exit");
                    state.quit = true;
                    break;
                }
                if (state.draw)
                {
                    updateScreen2(renderer, texture, state.pixels, pixels);
                    state.draw = false;
                }
                while (accumulator > timerDelta)
                {
                    if (state.delay_timer > 0)
                        state.delay_timer--;
                    if (state.sound_timer > 0)
                        state.sound_timer--;
                    accumulator -= timerDelta;
                }
                numCycles--;
                SDL_Delay(timePerCycle);
            }
        }
    }

Any pause/stepping code was removed for clarity.

Conclusion

So now that we’re here, what next? Some might have noticed I didn’t wire up the sound (yet). I should really do this to call this project “complete”. Also the tests need a bit more TLC - there was a lot of experimentation towards the end and whilst they provided a solid foundation, they didn’t necessarily grow alongside the codebase.

Finally perhaps, debugging capabilities are minimal at best. A cool feature to add would be the ability to “save state” - which should be as simple as serialising the State object - it contains all the information pertaining to the execution and loading should just be a case of restoring this.

All in all though it’s been a bag of fun - learnt tons on the way, and still am. I’d be interested to revisit this in another language down the line and see whether my approach would change.