- Note: This entire tutorial is also packaged as a
gzipped tar file (58 Meg) which can be downloaded
and viewed at the user's site.
-
This tutorial will take you through all of the steps required
to construct a self-contained speech recognizer, feature
extraction through evaluations, for a relatively simple task of
recognizing continuous alphadigit strings. The final models
will have multiple mixture components per state and will be
cross-word triphones. Most of the tools used are part of the
ISIP speech recognition system's
distribution. The exceptions are the standard NIST tools for
evaluation scoring and for speech file manipulation. These can
be obtained from the
NIST software distribution site.
-
The tutorial has pointers to the man pages of all the utilities
and scripts used in building a complete recognition
system. The man pages give examples of usage, but when using
the tutorial, follow the instructions in the tutorial, and
don't be confused by the similar but different examples on
the man pages. Several examples have been provided, as well. For
copyright reasons, we can show only a small part of the data
used to actually construct the complete system. A complete set
of models is provided. For more information on the speech
corpora used in this work visit the
Linguistic Data Consortium
and
Oregon Graduate Institute
corpora pages.
-
The following flow graph with insets can be used as a guide to
this tutorial. Click on any of the square boxes to read
detailed information on those topics.
-
Data Preparation
-
Model Estimation
-
Recognition of test data
-
Converting HTK models to ISIP format
-
Documentation home
|