WAVEFORM AUDIO FILE FORMAT (WAV)

The WAVE file format is a subset of Microsoft's RIFF (Resource Interchange File
Format) specification for the storage of multimedia resource files. This
structure has been used for various formats (AVI, PAL, RTF,...) but this
document will focus on its use with the WAV file format specifically. Only
mandatory chunks are explained in this document, there's no explanation of how
the LIST or INFO chunks work, or what the "cue " subchunk should countain.

More succinct documents out there present the "canonical WAVE file format" (I
highly recommend Craig Stuart Sapp's page on the topic), which should be more
than enough for most people.


RIFF structure

  RIFF (Resource Interchange File Format) is the tagged file structure designed
  for multimedia resource files. The structure of a RIFF file is similar to the
  structure of an Electronic Arts IFF file. RIFF is not actually a file format
  itself (since it does not represent a specific kind of information), but
  rather an encapsulation scheme.


Chunks

  The basic building block of a RIFF file is called a chunk and is defined as
  follows:

      BYTE      ckID[4];           // Chunk type identifier
      DWORD     ckSize;            // Chunk size in bytes (size of ckData)
      BYTE      ckData[];          // Chunk data

  Two special types of chunks identified by a ckID value of "LIST" and "RIFF",
  may contain nested chunks, or subchunks.

  Here is a discussion of various fields:
      ckID
        A four-character code that identifies the representation of the chunk
        data. A program reading a RIFF file can skip over any chunk whose chunk
        ID it doesn't recognize; it simply skips the number of bytes specified
        by ckSize plus the pad byte, if present.

        In the case of WAV files, ckID must be "RIFF".

      ckSize
        A 32-bit unsigned value identifying the size of ckData. This size value
        does not include the size of the ckID or ckSize fields or the pad byte
        at the end of ckData.

      ckData
        Binary data of fixed or variable size. The start of ckData is
        word-aligned with respect to the start of the RIFF file. If ckSize is
        an odd number of bytes, a pad byte with value zero is written after
        ckData. Word aligning improves access speed (for chunks resident in
        memory) and maintains compatibility with EA IFF.


RIFF Forms

  A RIFF form is a chunk with a "RIFF" ckID. The first DWORD of ckData in the
  RIFF chunk is a four-character code value identifying the form type of the
  file. For a WAV files, this four-character code is "WAVE".

  Following the form-type code is a series of subchunks. Which subchunks are
  present depends on the form type. They are introduced exactly like chunks: a
  four-character code followed by a DWORD specifying its length in bytes. Note
  however that the code is always lowercase for subchunks.


WAVE Format Chunk

  For WAV files, there are two mandatory subchunks, identified as "fmt " and
  "data". Other subchunks such as "cue ", "ltxt" or "file" are not covered.
 
 
  Subchunk - "fmt "

  This subchunk specifies how the data must be interpreted, and MUST always
  occur before subchunk "data". It is defined as follows:

      WORD      wFormatTag;        // Format category.
      WORD      wChannels;         // Number of channels.
      DWORD     dwSamplesPerSec;   // Sampling rate.
      DWORD     dwAvgBytesPerSec;  // For buffer estimation.
      WORD      wBlockAlign;       // Data block size.
      CHAR      formatSpecific[];  // Format-specific fields.

  Here is a discussion of various fields:
      wFormatTag
        A number indicating the WAVE format category of the file. The content
        of the <format-specific-fields> portion of the "fmt " chunk, and the
        interpretation of the waveform data, depend on this value.

        Currently defined WAVE format categories are:

          Value                       Format Category
          0x0001 (WAVE_FORMAT_PCM)    Microsoft Pulse Code Modulation (PCM).
          0x0101 (IBM_FORMAT_MULAW)   IBM mu-law.
          0x0102 (IBM_FORMAT_ALAW)    IBM a-law.
          0x0103 (IBM_FORMAT_ADPCM)   IBM AVC Adaptive Differential Pulse Code
                                      Modulation.

      wChannels
        The number of channels represented in the waveform data, such as 1 for
        mono or 2 for stereo.

      dwSamplesPerSec
        The sampling rate (in samples per second) at which each channel should
        be played.

      dwAvgBytesPerSec
        The average number of bytes per second at which the waveform data
        should be transferred. Playback software can estimate the buffer size
        using this value.

      wBlockAlign
        The block alignment (in bytes) of the waveform data. Playback software
        needs to process a multiple of wBlockAlign bytes of data at a time, so
        the value of wBlockAlign can be used for buffer alignment.

      formatSpecific
        This field may contain more or less bytes of information depending on
        the value found in wFormatTag.

        If wFormatTag is 0x0001, then the waveform data consists of samples
        represented in Pulse Code Modulation (PCM) format, and this field
        only contains a WORD (wBitsPerSample) describing the size in bits of
        each sample of each channel (if there are multiple channels, the sample
        size is the same for each channel).

        For PCM data, the wAvgBytesPerSec of the "fmt " chunk should be equal
        to the following formula rounded up to the next whole number:

          wAvgBytesPerSec = wChannels * wBitsPerSecond * (wBitsPerSample / 8)

        And the wBlockAlign field should be equal to the following formula,
        rounded up to the next whole number:

          wBlockAlign = wChannels * (wBitsPerSample / 8)


  Subchunk - "data"

  In a single-channel WAVE file, samples are stored consecutively. For stereo
  WAVE files, channel 0 represents the left channel, and channel 1 represents
  the right channel. The speaker position mapping for more than two channels is
  currently undefined. In multiple-channel WAVE files, samples are interleaved.

  Each sample is contained in an integer i. The size of i is the smallest
  number of bytes required to contain the specified sample size. The least
  significant byte is stored first. The bits that represent the sample
  amplitude are stored in the most significant bits of i, and the remaining
  bits are set to zero.

  For example, if the sample size (recorded in nBitsPerSample) is 12 bits, then
  each sample is stored in a two-byte integer. The least significant four bits
  of the first (least significant) byte is set to zero.


Canonical Microsoft PCM WAVE File Format

  This section describes the structure of a classic Microsoft Pulse Code
  Modulation (PCM) WAVE file and the value each field MUST have to match that
  specific file format.

  The file starts with RIFF chunk type, which is defined as follows:

      BYTE      ckID[4];           // Chunk type identifier, MUST be "RIFF".
      DWORD     ckSize;            // Chunk size in bytes, or LOF - 8.

  Right after appears the formType:

      BYTE      formType[4];       // Form type, MUST be "WAVE".

  Then appears the "fmt " subchunk, which describes how the audio data must be
  interpreted. It is defined as follows:

      BYTE      sckID1[4];         // Subchunk id, MUST be "fmt ".
      DWORD     sckSize1;          // Subchunk size in bytes, MUST be 16.
      WORD      wFormatTag;        // Format category, MUST be 0x0001.
      WORD      wChannels;         // Number of channels.
      DWORD     dwSamplesPerSec;   // Sampling rate.
      DWORD     dwAvgBytesPerSec;  // For buffer estimation.
      WORD      wBlockAlign;       // Data block size.
      WORD      wBitsPerSample;    // Bits per sample.

  Then appears the "data" subchunk, which is defined as follows:

      BYTE      sckID2[4];         // Subchunk id, MUST be "data".
      DWORD     sckSize2;          // Subchunk size in bytes.
      BYTE      sckData[];         // Subchunk data.