Programming with DirectX : Sound in DirectX – XAudio2

Audio2 is the Direct Sound replacement for Windows developers and is an enhanced version of the XAudio API that Xbox developers have been enjoying for some time. In this article we will create a demo that will play a sound file once and then exit. This demo will show you how to get XAudio2 up and working to play sound inside any application.

XAudio2 does not have a way to detect and convert audio files between incompatible endian orders. This means that if you are working directly with XAudio2 on Xbox 360 and Windows, you must handle endian order carefully.

XAUDIO2 Demo

Like XACT3, XAudio2 has an interface that you create to use XAudio2. This interface is called IXAudio2, and it is created by calling the SDK function XAudio2Create(). On the Xbox 360 this is an actual API function, while on Windows, according to the DirectX documentation, it is a convenient inline function defined in XAudio2.h. XAudio2Create() has the following function prototype and takes as parameters the IXAudio2 object that will be created, creation flags (defaults to 0 or XAUDIO2_DEBUG_ENGINE for debug mode), and an audio processor that specifies which CPU XAudio2 should use, which has a default value of XAUDIO2_DEFAULT_PROCESSOR.

HRESULT XAudio2Create(
  IXAudio2 **ppXAudio2,
  UINT32 Flags = 0,
  XAUDIO2_PROCESSOR XAudio2Processor = XAUDIO2_DEFAULT_PROCESSOR
);

On the Xbox 360, XAudio2 is implemented as a statically linked library, while on Windows it is a COM object implemented by a dynamic link library.

XAudio2 uses something known as voices to manipulate and control audio. There are three types of these voices in the XAudio2 API: source voices, submix voices, and mastering voices. A source voice is used to send sound data to the other types of voices, and it represents an audio stream of data. A submix voice is used to process audio data from a source voice to perform various effects (e.g., sample rate conversion) and can also be used as an input voice to another submix voice or to a mastering voice. A mastering voice is the voice that is audible, and it sends that data it receives from source and submix voices to the audio hardware. The mastering voice is the only voice that allows you to hear anything, so you must create this voice in XAudio2 to hear anything.

As far as the basics of XAudio2 are concerned this is essentially what you need to play audio in the API. In the XAudio2 demo’s main source file the function calls CoInitializeEx() because XAudio2 is a COM object in Windows. It creates the XAudio2 engine, and it creates the mastering voice that will play the actual sound. In the demo the loading and playing of the actual file are done in a function called PlayPCM(), which will be discussed later in this section.

The creation of the mastering voice is done with a call to CreateMasteringVoice(), which takes as parameters an address to an IXAudio2MasteringVoice object that will store the created voice object, the audio channels, the audio sample rate, flags for the voice (which must be set to 0), the output device index the voice will use, and an optional audio effects chain using the structure XAUDIO2_EFFECT_CHAIN. The audio channels are set to XAUDIO2_DEFAULT_CHANNELS and default to 5.1 surround on Xbox 360. In Windows, XAudio2 attempts to determine the speaker configuration.

The main() function in the XAudio2 demo is shown in Listing 1. To recap, the function initializes COM, creates the audio engine, creates the mastering voice, loads and plays the sound with a call to PlayPCM() that will be implemented next, and exits the application after releasing the audio engine and uninitializing COM.

Listing 1. The XAudio2 Demo’s main() Source File
int main(int args, char* argc[])
{
   cout << "XAudio2 Demo: Playing clip.wav" << endl << endl;
   cout << "Demo will end when the sound is done." << endl << endl;

   if(FAILED(CoInitializeEx(NULL, COINIT_MULTITHREADED)))
      return 0;

   IXAudio2* xAudio2Engine = NULL;

   UINT32 flags = 0;
#ifdef _DEBUG
    flags |= XAUDIO2_DEBUG_ENGINE;
#endif

   if(FAILED(XAudio2Create(&xAudio2Engine)))
   {
      cout << "XAudio2 engine was not created!" << endl;
      CoUninitialize();
      return 0;
   }

   IXAudio2MasteringVoice *masterVoice = NULL;

   if(FAILED(xAudio2Engine->CreateMasteringVoice(&masterVoice,
      XAUDIO2_DEFAULT_CHANNELS, XAUDIO2_DEFAULT_SAMPLERATE,
      0, 0, NULL)))
   {
      cout << "Master voice was not created!" << endl;

      if(xAudio2Engine != NULL)
         xAudio2Engine->Release()

      CoUninitialize();
      return 0;
   }

   if(PlayPCM(xAudio2Engine, "clip.wav") == false)
   {
      cout << "clip.wav failed to load!" << endl;

      if(xAudio2Engine != NULL)
         xAudio2Engine->Release()

      CoUninitialize();
      return 0;
   }

   if(xAudio2Engine != NULL)
      xAudio2Engine->Release()

   CoUninitialize();

   return 1;
 }

					  


An audio file is loaded and played with a call to PlayPCM(). This function is a modified version of the PlayPCM() function offered in the Microsoft DirectX SDK sample XAudio2BasicSound. To load and play sounds we will use this function as well as the files SDKwavefile.h and SDKwavefile.cpp. The SDKwavefile files are part of the DirectX Utility (DXUT) library and can be found in any of the DXUT samples in the DirectX SDK. Since these files are part of DirectX, we will use them instead of writing some very long and complicated code for loading audio files. Since the files use DXUT, they have been slightly altered so that the use of the SDKwavefile files does not require any of the other DXUT headers or source files.

The PlayPCM() function uses the CWaveFile class defined in SDKwavefile.h to open the audio file. The file is read by calling the Read() function, which takes as parameters a buffer to read into, the size to read in bytes, and an out pointer to the size of bytes read by the function.

Once the file is loaded, the source voice is created. Keep in mind that the source voice represents a stream of audio data. To create the source voice, which has an interface of IXAudio2SourceVoice, we call the CreateSourceVoice() function of the XAudio2 engine object. This function takes the source voice that will be created, the format of the audio (using the WAVEFORMATEX structure provided by Windows), behavior flags, the maximum frequency ratio, a callback interface function, a send list of source voices for the destination of the audio date (optional), and an audio effect chain. The behavior flags can have one of the following values:

  • XAUDIO2_VOICE_NOPITCH for no pitch control

  • XAUDIO2_VOICE_NOSRC for no sample rate conversion

  • XAUDIO2_VOICE_USEFILTER to enable filter effects on the sound

  • XAUDIO2_VOICE_MUSIC to state that the voice is used to play background music

Once the source voice is created, an audio buffer using the XAudio2 structure XAUDIO2_BUFFER is created. This buffer will take the sound data and submit it to the sound voice, which can only happen after a valid sound voice has been created by CreateSoundVoice(). The audio buffer has the audio data assigned to the pAudioData variable, the audio flags to the Flags variable, and the size of the audio to the AudioBytes variable. The flag of XAUDIO2_END_OF_STREAM tells XAudio2 that there is no more data to follow after the sound has played.

To submit the data to the source voice, you call SubmitSourceBuffer() on the source voice object, which takes as a parameter the XAUDIO_BUFFER object. If all is successful, you can start processing the sound by calling Start() on the source voice. The Start() function takes as parameters behavior flags that must be set to 0 and an operation set. The operation set can be XAUDIO2_COMMIT_NOW to apply the operation immediately or XAUDIO2_COMMIT_ALL to apply all pending operations.

When a source voice is processing, it is being played. You can test the state of the sound by calling the GetState() function on the source voice object. This will return an XAUDIO2_VOICE_STATE object that you can test for various states. To test if the sound is still playing you can test if the BuffersQueued variable is greater than 0.

Once you are done with a source voice, you free it by calling DestroyVoice(). The entire PlayPCM() function is shown in Listing 2 with all the code we’ve just discussed in the previous few paragraphs. This function essentially loads a sound, plays it, and then frees it from memory. As a bonus exercise you should separate the loading and playing code into their own functions and allow the sound to be played multiple times before it is freed.

Listing 2. The PlayPCM() Function
bool PlayPCM(IXAudio2* xAudio2Engine, char *filename)
{
   CWaveFile wav;

   if(FAILED(wav.Open(filename, NULL, WAVEFILE_READ)))
      return false;
   WAVEFORMATEX *format = wav.GetFormat();
   unsigned long wavSize = wav.GetSize();
   unsigned char *wavData = new unsigned char[wavSize];

   if(FAILED(wav.Read(wavData, wavSize, &wavSize)))
   {
      if(wavData)
         delete[] wavData;

      return false;
   }


   IXAudio2SourceVoice *srcVoice;

   if(FAILED(xAudio2Engine->CreateSourceVoice(&srcVoice, format,