4.2.6 Network Decoding:
Parameter Tuning
Parameter files are an important element of our speech recognition system.
They save the user the trouble of entering in countless parameters at
the command line. This file, along with the configuration file, makes it
easy to tweak parameters, change
recognition modes, etc. The parameter file consists of the more critical
recognition parameters such as specifying a language model file,
controlling output, etc. This section will explain the parameters used for
a decoding experiment and show how these
parameters can be tweaked to make the recognizer behave differently.
Download the example parameter file
params_decode.sof.
This is the parameter file used for the experiment in
Section 4.2.5
to decode an utterance using cross-word triphones. The
parameter file can have any name, but must be specified in the
isip_recognize command line. The command line for isip_recognize is covered
in
Section 4.2.1.
Now, let's examine the
params_decode.sof
parameter file one line at a time.
@ Sof v1.0 @
@ HiddenMarkovModel 0 @
The first two lines of the parameter file allow the recognizer to interpret
the following lines as parameters. These lines must be included in order
for the recognizer to accept the parameter file.
The algorithm specified in this line basically tells the recognizer what
it's primary function will be for an experiment. Throughout Section 4 of
the tutorial, we've used DECODE for our algorithm. Other possibilities
exists, most of which are invloved with the training process which will
be discussed in later sections.
implementation = "VITERBI";
This parameter tells the recognizer what method to use for the given
algorithm. The VITERBI implementation is used for decoding. Other
implementations will be discussed in later sections.
context_mode = "CROSS_SYMBOL";
Sometimes, it may be necessary to provide the
recognizer with information about type of context being used.
The context_mode parameter tells the recognizer how to treat the phones.
In this experiment, we used cross-word triphones, so we set the context_mode
parameter to CROSS_SYMBOL. For word-internal triphones, we would set the
parameter to SYMBOL_INTERNAL. The default value for this parameter is
SYMBOL_ONLY, and is used for experiments using monophones. Make sure that
this parameter agrees with the language model file you are using.
output_mode = "DATABASE";
The recognizer can produce several different types of output. The
output_mode parameter can be used to set the desired type. In this
case, we want the output to be placed in a transcription database that
will contain the hypotheses and time alignments for each of the
test utterances. We can also send the hypotheses to a plain text file
that lists the file identifiers and their corresponding hypotheses by
setting the output_mode parameter to FILE. It's also possible to
place each of the hypotheses in seperate files corresponding to
the file identifiers. In this case, we would set the parameter to
LIST.
The output file can be either TEXT or BINARY. TEXT files can be inspected
manually since the contents are readable. We use TEXT files for most of
the experiments in this tutorial for that purpose. TEXT files take longer
to load, however, since more parsing is required. The contents of a BINARY
file cannot be inpected manually, but can be processed and loaded faster
than text files.
configuration = "$ISIP_TUTORIAL/sections/s04/s04_02_p05/config.sof";
The configuration file contains several other parameters that are important
to the recognizer. The contents of this file are discussed in the
next section.
output_file = "$ISIP_TUTORIAL/sections/s04/s04_02_p05/results.out";
The output_file parameter tells the recognizer where to send the results.
The contents of this file depends on the output_mode and output_type
parameters.
frontend = "$ISIP_TUTORIAL/recipes/frontend.sof";
This file verifies that the input utterances can be read by the
recognizer and that they conform to the standard frontend.
audio_database = "$ISIP_TUTORIAL/research/isip/databases/db/tidigits_audio_db_test.sof";
The audio database contains a reference to all of the test utterances and
associates each of the utterances with a file identifier.
language_model= "$ISIP_TUTORIAL/models/xword_phone_models/compare/lm_xword_ihd_8mix_train.sof";
statistical_model_pool = "$ISIP_TUTORIAL/models/xword_phone_models/compare/smp_xword_8mix_train.sof";
These two parameters define the files containing the language model and the
statistical model pool. It is important
that these files agree with the parameters listed in both this file and
the configuration file.
|