One way of viewing the process of feature extraction is through simple
block diagrams. The diagram below illustrates the process of extracting
a single feature, energy, from speech data stored on a computer.
The block on the far left, labeled Inp (Input), represents speech
stored in digital form on a computer. The center block, labeled
Engy(Energy), represents a computer program or algorithm
specifically designed to measure energy values in the speech data.
This algorithm is applied to the speech data. The measurements
are then stored in a computer file of features measurements, represented
by the block on the far right labeled Out (Output).
Note that the above diagram does not illustrate the use of windows.
As previously discussed, this technique is always
applied in practice. The diagram below includes a block labeled
Wind (Window). This represents a windowing algorithm that is
applied to determine the number of samples used to calculate the energy
Note that the blocks indicate special algorithms applied to extract
features from the speech data. Special algorithms are also used to
input the speech to the feature extraction algorithms and output the
features extracted in a computer file. Typically, the frame duration
is set in the input algorithm.
Energy is considered a
feature since it can be computed using the sum of the squared
values of the sampled speech data. We can also view the frequency
spectrum of a speech signal by converting it using mathematical
techniques. As mentioned, the
is a commonly used technique for converting signals from the time domain
The block diagram below illustrates the process of computing the frequency
spectrum for a speech signal using a window of samples.
The blocks labeled Inp, Out, and
Wind are described above. The block labeled
Spec (Spectrum), represents the Fourier Transform, which converts
the speech signal to a frequency spectrum. This technique is
commonly used to compute spectrograms. See
Section 3.1.1 for
an example spectrogram.