In a previous post I outlined how to read the contents of the game's archive files. Here I'll describe how the audio files are stored.
Audio files have .sfx extensions and there are vast numbers of them. They are compressed using the GSM 06.10 cell phone audio compression standard. All the audio files have a 20-byte header followed by a sequence of 33-byte frames of GSM data. Since this is a cell-phone audio standard, each frame can be decoded independently.
All the headers start with the same 12 bytes: 32 54 76 98 01 00 00 00 00 00 80 3F. After this comes two 4-byte little-endian numbers: the number of bytes of data following the header, which should be divisible by 33; and the sample rate (22050 samples per second in the file I'm looking at).
Each 33-byte frame decodes to 160 16-bit samples, so the compressed data rate is around 4.5 KB per second. I used code by Jutta Degener and Carsten Bormann to decode the frames.
Here's a random snip of dialogue from the game (ilott_found_naarn_g_7-9). Ilott has just learned of his brother's death at the hands of Kroax:
I have not had a chance to play with it, but I suspect that they get some of their “alien voice” sound by monkeying with the parameters of the linear predictive coding used to compress the audio, since it is essentially modeling a vocal tract.
Note: Another person requested the command-line tool for converting the SFX files to WAV format. I dusted it off, updated the project files to the latest version of Visual Studio, and put it up on Github: https://github.com/mcneja/sfx2wav. It's just a thin wrapper around the library mentioned above and converts all of the files in a given directory, writing the output files to a specified output directory.