But first, an addendum

In my previous post in this series I covered a few steps needed to get an XServer up and running. I started looking at what was needed for audio. In a nutshell it meant running a Pulseaudio server (which is separate from your XServer used to display stuff - this one is used to bridge audio). After a couple of fruitless tries to get this up and running I came across WSLg - which is WSL with integrated graphics and sound support. Yep - you read that right. You no longer need to run a local XServer (unless you really want to), set up your environment etc… You can simply “plug and play”. The WSLg GitHub pages contains all the necessary information for a seamless upgrade. I’ll be using that so I can cut down on the preliminaries and start writing more code!

Sound

The CHIP8 sound spec is quite simple - there’s a single sound (a buzzer) and that’s it. It is controlled by a timer - if it has a value greater than zero we should buzz, and stop whenever it is zero. That timer is decremented at a rate of 60Hz, so a value of 60 would make the buzzer sound for a second (which let’s face it, is as much time as anyone would really want to hear a buzzer for).

Testing sound output

If this is the first time you’re trying to play sound via WSLg, you’ll likely need to install pulseaudio via sudo apt install libpulse0 pulseaudio mplayer. The latter (mplayer) is a command-line music player we’ll use to, well, play a .wav file.

Once everything has been installed, check that the following environment variable has been set:

❯ env | grep PULSE
PULSE_SERVER=/mnt/wslg/PulseServer

If you have just updated WSL via --update, you’ll need to issue a --shutdown and start it up again (which will happen automatically when you run say Windows Terminal Preview). That environment variable should already be set for you.

Grab a sample wav file from e.g. here, and try to play it:

❯ curl https://www.soundjay.com/misc/fail-buzzer-01.wav -o /tmp/buzz.wav
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  282k  100  282k    0     0  54138      0  0:00:05  0:00:05 --:--:-- 67559
❯ mplayer /tmp/buzz.wav
MPlayer 1.3.0 (Debian), built with gcc-9 (C) 2000-2016 MPlayer Team
do_connect: could not connect to socket
connect: No such file or directory
Failed to open LIRC support. You will not be able to use your remote control.

Playing /tmp/buzz.wav.
libavformat version 58.29.100 (external)
libavformat file format detected.
[lavf] stream 0: audio (pcm_s24le), -aid 0
Clip info:
 encoded_by: Pro Tools
 originator_reference: 6HfHPHSVCUhaaaGk
 date: 2010-09-21
 creation_time: 14:31:39
 time_reference: 0
Load subtitles in /tmp/
==========================================================================
Opening audio decoder: [pcm] Uncompressed PCM audio decoder
AUDIO: 48000 Hz, 2 ch, s24le, 2304.0 kbit/100.00% (ratio: 288000->288000)
Selected audio codec: [pcm] afm: pcm (Uncompressed PCM)
==========================================================================
AO: [pulse] 48000Hz 2ch s16le (2 bytes per sample)
Video: no video
Starting playback...
A:   0.4 (00.4) of 1.0 (01.0)  0.0%
Invalid return value 0 for stream protocol
Invalid return value 0 for stream protocol
A:   0.9 (00.9) of 1.0 (01.0)  0.0%


Exiting... (End of file)

Make sure that works before you proceed with the next step!

Audio with SDL2

To play our buzzer in SDL2 we need to do 3 things. First, we need to load the “buzz” sound. This being in a .wav format the sound file can be decoded internally by SDL2 itself. The second part is to open the audio device.

    SDL_AudioSpec audio_spec;
    uint8_t *audio_buffer;
    uint32_t audio_length;
    SDL_LoadWAV("/tmp/buzz.wav", &audio_spec, &audio_buffer, &audio_length);
    SDL_AudioDeviceID deviceId = SDL_OpenAudioDevice(NULL, 0, &audio_spec, NULL, 0);

At that point we have all we need to start playing our .wav file - all that’s left is queuing the audio:

    SDL_QueueAudio(deviceId, audio_buffer, audio_length);
    SDL_PauseAudioDevice(deviceId, 0);

Okay so I sneaked that last line in. The way this works with SDL is that the audio “player” has a buffer - and we can enqueue our audio, and when we “unpause” the audio device it starts sending the contents of that buffer to the audio device.

What that does mean however is that the contents of that buffer gets exhausted after being sent out. If there’s nothing left in the buffer SDL will “play” silence. We can toggle this by pausing/unpausing the audio device, but we still face the fact that once it’s played, it’s gone. We can’t exactly loop the sound automagically.

From what I can tell there are 2 ways around this. We can enqueue audio periodically - maybe by keeping track of how long the audio device has been playing for. E.g. if we know our sample is 4s long and we’ve been playing audio for 3s, we should call SDL_QueueAudio again to fill up the buffer.

The other way is to use callbacks - SDL provides a mechanism through which when the audio buffer needs more data, a user-defined function is called. This comes in the form of the SDL_AudioSpec struct which has callback and userdata fields - essentially the latter gets passed as an argument to the former. We could pass in the location of audio_buffer and its audio_length as our custom data and memcpy accordingly - essentially mimicking a circular buffer that never gets empty (sort of).

I haven’t made up my mind yet as to which I’d want to use, so here’s the skeleton for the callback version in case we end up needing it later.

We start by defining our userdata struct along with our callback function:

struct Buzz
{
    uint8_t *audio_buffer;
    uint32_t audio_length;
};

void cb(void *userdata,
        unsigned char *stream,
        int len)
{

    struct Buzz *buzz = (struct Buzz *)userdata;
    memcpy(stream, buzz->audio_buffer, buzz->audio_length > len ? len : buzz->audio_length);
}

Registring the callback is pretty simple - the first 4 lines are the same as the enqueue version,

    SDL_AudioSpec audio_spec;
    uint8_t *audio_buffer;
    uint32_t audio_length;
    SDL_LoadWAV("/tmp/buzz.wav", &audio_spec, &audio_buffer, &audio_length);
    struct Buzz buzz = {.audio_buffer = audio_buffer, .audio_length = audio_length};
    audio_spec.userdata = &buzz;
    audio_spec.callback = &cb;
    SDL_AudioDeviceID deviceId = SDL_OpenAudioDevice(NULL, 0, &audio_spec, NULL, 0);

Buzz is our userdata, and cb the callback function. Whenever the buffer becomes emtpy, SDL will call our callback function for more data. Note the buffer won’t be pushed to the audio device until such a time that it is unpaused.

For testing this, I added the following to the main switch statement along with a bool playing = false at the outset:

        case SDL_KEYDOWN:
        {
            SDL_KeyboardEvent key_event = event.key;
            SDL_Log("Key down registered: %d", key_event.keysym.sym);
            if (key_event.keysym.sym == key_p)
            {
                SDL_PauseAudioDevice(deviceId, playing ? 0 : 1);
                playing = !playing;
            }
        }

Caveats

From a queuing perspective we should technically be able to use SDL_GetQueuedAudioSize to see how much audio remainds to be sent to the audio device - and if that’s below a certain threshold we can enqueue again. However I couldn’t get this to behave properly when the audio device was playing (as in this only returned when the device was paused). If I could however, it’d be a matter of checking this at each iteration and enqueue accordingly.

On the callback side of things, note our function copies at most len bytes. That is, the buffer provided to the callback function is of fixed size which, in our case is smaller than the .wav sample. This means that every time we copy data, we only copy a portion of the sample. Given it’s a buzzing sound it doesn’t matter so much but ideally we’d be able to keep track of how much we copied and start from there at the next invocation.

Either of these issues aren’t blockers however, so let’s cross that bridge when we get to it.

References