October / Monthly / Tutorials / Software / Home

Classes that wrap complex functionality, such as estimating the frequency domain content of a signal, are an excellent opportunity to showcase the rapid prototyping capabilities of our front end software. For a detailed tutorial, see our Fundamentals of Speech Recognition on-line tutorial (currently under development). For an extended tutorial on signal processing in speech recognition, two other important resources are signal modeling in speech recognition and our on-line workshop notes on signal processing. This month's tutorial describes one class, Spectrum, which resides in the algorithm library and is used to perform many different types of frequency response calculations.

In speech recognition, we use many different representations of the signal for spectrum computations. These range from sampled data, which can be used to compute the Fourier Transform, to linear prediction coefficients, which can be used to compute the spectrum using a maximum entropy model. At the same time, in some applications, we prefer only to deal with the magnitude spectrum (e.g., recognition), while in other applications (e.g., speech coding) we prefer to deal with the complex spectrum. It is useful to have a class (e.g., Spectrum) that provides a simple and uniform interface to all these constraints. This is particularly important when developing software such as transform builder which operate on feature streams in a data-driven way.

A complete description of the Spectrum class can be found in our on-line manual pages. The Spectrum class supports three basic types of inputs: signals, linear prediction coefficients, and correlation coefficients. Each of these representations can be transformed to a spectral representation of the input. Let us demonstrate this procedure using a simple example: computation of the spectrum from sampled data using a Fourier Transform. Below is the code to do this (click here to download this code.):

    // isip include files
    //
    #include <Spectrum.h>
    
    // main program starts here
    //
    int main(int argc, const char **argv) {
      
      // declare a Spectrum object and an output vector
      //
      Spectrum spectrum;
      VectorFloat output;
    
      // generate 3 ms sine wave of frequency 1000Hz
      //
      long sample_freq = 8000;
      long num_samples  = 240;
    
      // create the input sampled data
      //
      VectorFloat input(num_samples);
      double step = Integral::TWO_PI * 1000.0 / (double)sample_freq;
      input.ramp(0, step); 
      input.sin();
    
      // set algorithm and implementation
      //
      spectrum.setAlgorithm(Spectrum::FOURIER);       
      spectrum.setImplementation(Spectrum::MAGNITUDE);
    
      // set the algorithm, implementation and order of the underlying
      // FourierTransform object
      //
      spectrum.setFtAlgorithm(FourierTransform::FFT);
      spectrum.setFtImplementation(FourierTransform::SPLIT_RADIX);  
      spectrum.setFtOrder((long)256);
    
      // compute the spectrum of input data
      //
      spectrum.compute(output, input);
    
      // output the input, the spectrum object and the output
      // magnitude-spectrum to the console
      //
      input.debug(L"input");
      spectrum.debug(L"spectrum");
      output.debug(L"output");
      
      // exit gracefully
      //
      Integral::exit();
    }
In the above example, we set the algorithm and the implementation of the Spectrum object to FOURIER and MAGNITUDE, respectively, because the input is a sampled signal and we wish to compute the magnitude of the spectrum. We also choose to set the algorithm, the implementation and the order of the underlying Fourier transform used to compute the spectrum as FFT, SPLIT_RADIX and 256. For more information on the choices for algorithm and implementation types for the Fourier transform, refer the manual pages of the FourierTransform class. Processing issues like zero-padding and truncations are transparent to the users. In our example, the number of input samples (240) is less than the order (256) of the Fourier transform and so the input is zero-padded at the end to the length of 256 samples.

Now, let us consider an example that demonstrates the computation of the magnitude spectrum from linear prediction coefficients. Click here to download the code shown below:

    // isip include files
    //
    #include <Spectrum.h>
    
    // main program starts here
    //
    int main(int argc, const char **argv) {
      
      // declare a Spectrum object, an output vector, and an input vector
      // consisting of linear prediction coefficients
      //
      Spectrum spectrum;
      VectorFloat output;
      VectorFloat input(L"1.0000000, -0.9666889, 0.00009363125, 0.00009251215, 0.03525940");
    
      // set algorithm and implementation
      //
      spectrum.set(Spectrum::MAXIMUM_ENTROPY, Spectrum::MAGNITUDE);
    
      // set the algorithm, implementation and order of the underlying
      // FourierTransform object
      //
      spectrum.setFtAlgorithm(FourierTransform::FFT);
      spectrum.setFtImplementation(FourierTransform::SPLIT_RADIX);  
      spectrum.setFtOrder((long)128);
    
      // compute the magnitude spectrum of input linear prediction
      // coefficients. set the input data type to PREDICTION
      //
      spectrum.compute(output, input, AlgorithmData::PREDICTION);
    
      // output the input, the spectrum object and the output
      // magnitude-spectrum to the console
      //
      input.debug(L"input");
      spectrum.debug(L"spectrum");
      output.debug(L"output");
      
      // exit gracefully
      //
      Integral::exit();
    }
In this example, we set the algorithm and the implementation of the Spectrum object as MAXIMUM_ENTROPY and MAGNITUDE, respectively, because the input consists of linear prediction coefficients. Note that the input coefficient type is set to PREDICTION in the compute method. We also choose to set the algorithm, the implementation and the order of the underlying Fourier transform used to compute the spectrum as FFT, SPLIT_RADIX, and 128 respectively. Once again, the impulse response of the linear prediction filter will be zero-padded as necessary to produce a 128-point transform.

The Spectrum class is used in many of our front end implementations to extract frequency domain information. For a detailed tutorial on this topic, see our Fundamentals of Speech Recognition tutorial.