Network Training Using the Production System
Network training tutorial is an introduction to a new feature of the production system that gives users an ability to train pronunciation models directly. In fact, a user can use this feature to train any level in a user-defined hierarchy of networks, including the language model and lexicon levels. To learn more about network training, see our April 2002 monthly tutorial on network training.
This is a self-guided tutorial that is aimed at helping users to understand the procedure to build a system that demonstrates the network training. The experiment included in this tutorial is a continuous phone-based TIDIGITS speech recognition. The speech data for this experiment consists of 941 training utterances and 336 test utterances that were randomly selected from the TIDIGITS corpus. 39 dimensional features that consists of 12 cepstral coefficients plus log energy along with their deltas and double deltas are employed in this experiment. Energy normalization and cepstral mean subtraction on an utterance basis have been included in the feature extraction process.
Multiple pronunciations instead of single pronunciation of the words have been used in the Baum-Welch based training. A single-state silence model with self-loop is dynamically inserted between words during runtime to account for unlimited amount of silence between the words. All the files required for this experiment have been bundled with this package.
To download this tutorial, click on Network Training (v0.0 - 04/27/02). All the detailed instructions on the procedure to build this system from the scratch have been provided in the release's AAREADME.text file.