Alphadigits Tutorial / Prototype System / Tutorials / Software / Home

Note: This entire tutorial is also packaged as a gzipped tar file (58 Meg) which can be downloaded and viewed at the user's site.
This tutorial will take you through all of the steps required to construct a self-contained speech recognizer, feature extraction through evaluations, for a relatively simple task of recognizing continuous alphadigit strings. The final models will have multiple mixture components per state and will be cross-word triphones. Most of the tools used are part of the ISIP speech recognition system's distribution. The exceptions are the standard NIST tools for evaluation scoring and for speech file manipulation. These can be obtained from the NIST software distribution site.
The tutorial has pointers to the man pages of all the utilities and scripts used in building a complete recognition system. The man pages give examples of usage, but when using the tutorial, follow the instructions in the tutorial, and don't be confused by the similar but different examples on the man pages. Several examples have been provided, as well. For copyright reasons, we can show only a small part of the data used to actually construct the complete system. A complete set of models is provided. For more information on the speech corpora used in this work visit the Linguistic Data Consortium and Oregon Graduate Institute corpora pages.
The following flow graph with insets can be used as a guide to this tutorial. Click on any of the square boxes to read detailed information on those topics.
Data Preparation
- Preliminary data preparation
- Feature extraction
Model Estimation
Recognition of test data
- Recognizing the test data and evaluation
Converting HTK models to ISIP format
Documentation home