/ Recognition / Fundamentals / Production / Tutorials / Software / Home

4.2.6 Network Decoding: Parameter Tuning

Parameter files are an important element of our speech recognition system. They save the user the trouble of entering in countless parameters at the command line. This file, along with the configuration file, makes it easy to tweak parameters, change recognition modes, etc. The parameter file consists of the more critical recognition parameters such as specifying a language model file, controlling output, etc. This section will explain the parameters used for a decoding experiment and show how these parameters can be tweaked to make the recognizer behave differently.

Download the example parameter file params_decode.sof. This is the parameter file used for the experiment in Section 4.2.5 to decode an utterance using cross-word triphones. The parameter file can have any name, but must be specified in the isip_recognize command line. The command line for isip_recognize is covered in Section 4.2.1. Now, let's examine the params_decode.sof parameter file one line at a time.

@ Sof v1.0 @
@ HiddenMarkovModel 0 @

The first two lines of the parameter file allow the recognizer to interpret the following lines as parameters. These lines must be included in order for the recognizer to accept the parameter file.

algorithm = "DECODE";

The algorithm specified in this line basically tells the recognizer what it's primary function will be for an experiment. Throughout Section 4 of the tutorial, we've used DECODE for our algorithm. Other possibilities exists, most of which are invloved with the training process which will be discussed in later sections.

implementation = "VITERBI";

This parameter tells the recognizer what method to use for the given algorithm. The VITERBI implementation is used for decoding. Other implementations will be discussed in later sections.

context_mode = "CROSS_SYMBOL";

Sometimes, it may be necessary to provide the recognizer with information about type of context being used. The context_mode parameter tells the recognizer how to treat the phones. In this experiment, we used cross-word triphones, so we set the context_mode parameter to CROSS_SYMBOL. For word-internal triphones, we would set the parameter to SYMBOL_INTERNAL. The default value for this parameter is SYMBOL_ONLY, and is used for experiments using monophones. Make sure that this parameter agrees with the language model file you are using.

output_mode = "DATABASE";

The recognizer can produce several different types of output. The output_mode parameter can be used to set the desired type. In this case, we want the output to be placed in a transcription database that will contain the hypotheses and time alignments for each of the test utterances. We can also send the hypotheses to a plain text file that lists the file identifiers and their corresponding hypotheses by setting the output_mode parameter to FILE. It's also possible to place each of the hypotheses in seperate files corresponding to the file identifiers. In this case, we would set the parameter to LIST.

output_type = "TEXT";

The output file can be either TEXT or BINARY. TEXT files can be inspected manually since the contents are readable. We use TEXT files for most of the experiments in this tutorial for that purpose. TEXT files take longer to load, however, since more parsing is required. The contents of a BINARY file cannot be inpected manually, but can be processed and loaded faster than text files.

configuration = "$ISIP_TUTORIAL/sections/s04/s04_02_p05/config.sof";

The configuration file contains several other parameters that are important to the recognizer. The contents of this file are discussed in the next section.

output_file = "$ISIP_TUTORIAL/sections/s04/s04_02_p05/results.out";

The output_file parameter tells the recognizer where to send the results. The contents of this file depends on the output_mode and output_type parameters.

frontend = "$ISIP_TUTORIAL/recipes/frontend.sof";

This file verifies that the input utterances can be read by the recognizer and that they conform to the standard frontend.

audio_database = "$ISIP_TUTORIAL/research/isip/databases/db/tidigits_audio_db_test.sof";

The audio database contains a reference to all of the test utterances and associates each of the utterances with a file identifier.

language_model= "$ISIP_TUTORIAL/models/xword_phone_models/compare/lm_xword_ihd_8mix_train.sof";
statistical_model_pool = "$ISIP_TUTORIAL/models/xword_phone_models/compare/smp_xword_8mix_train.sof";

These two parameters define the files containing the language model and the statistical model pool. It is important that these files agree with the parameters listed in both this file and the configuration file.

Glossary / Help / Support / Site Map / Contact Us / ISIP Home