ASR_VA_TUTORIAL_V1.0: SYSTEM OVERVIEW ------------------------------------- This file contains a brief synopsis of each step required to train and evaluate a single-mixture context-independent (monophone) system. This system was trained on 8 speakers, defined in: train_953_v1.0.list and evaluated on: devtest_255_v1.0.list using a sentence-pattern grammar. It achieved a word error rate of 3.4%, documented in: hypothesis.report which is generated at the end of this recipe. Below is a description of each step required to develop this system starting from unseeded models. =============================================================================== A. FILE SYSTEM OVERVIEW The first step is to unpack the appropriate tar files into a workspace. Let's assume these have been installed into a directory called asr_va_tutorial_v1.0. Set your current working directory to this place: cd /home/users/.../asr_va_tutorial_v1.0 All pathnames described below are relative to this location. At this location, once you build the software, you will find these files and directories: AAREADME.text general release information GNUmakefile make file GNUmakefile.in source make file (ignore) ISIP_WSJ_ENV.sh environment variables ISIP_WSJ_ENV.sh.in source env variables (ignore) class C++ object code config.cache configure files (ignore) config.guess configure files (ignore) config.log configure files (ignore) config.status configure files (ignore) config.sub configure files (ignore) configure configure files (ignore) data recognition-related configuration files install-sh configure files (ignore) scripts general purpose scripts (see the appendix) util source code for recognition driver scripts In the current shell, you must source ISIP_WSJ_ENV.sh using the bash shell: source ISIP_WSJ_ENV.sh You can now proceed to training. For the remainder of this tutorial, we will also assume you are running on one CPU with the name "isip105". At several points in this tutorial you will need to supply this machine name as part of the command line arguments. =============================================================================== B. TRAINING OVERVIEW We need to create a workspace to run experiments. For this tutorial, let's assume this space exists in a directory called exp: mkdir exp; mkdir exp/exp_001; cd exp/exp_001; All the paths and directories for training will be created automatically relative to the "exp_001" directory. Training can be run using this command: wsj_run -train_mfc_list ./train_953_v1.0_mfc.list \ -cpus_train isip105 | tee train.log This will run training to its completion and create the necessary output models. Note that the filename lists must correctly identify the location of the data ON YOUR MACHINE, and the pathname to the filename ("./" in this case) will need to vary depending on where the list is located on your machine. The result of the above command is a set of acoustic models. These are located at: train/baum_welch/monophone/mixture/final_models Below, we will describe all steps leading to this result. The log file, train.log, will contain a step-by-step status report of training as it progresses. B.1 INITIALIZATION 1) Checking for input data directory and files: checks whether all needed data exists (/home/users/.../asr_va_tutorial_v1.0/data) 2) All training mfc files exist: checks whether all the input feature files specified from the command line exists. 3) Creating local data directories: - data_generation - train - isip105 4) Creating transcriptions: - Create monophone transcriptions without "sp" between words corresponding to the input features file list. data_generation/transcriptions/mono_trans_no_sp.text - Create monophone transcriptions with "sp" between words corresponding to the input features file list. data_generation/transcriptions/mono_trans_with_sp.text - Create word transcriptions with "sp" between words corresponding to the input features file list. data_generation/transcriptions/word_transcription.text 5) Creating training lists data_generation/lists/train_mfc.list data_generation/lists/aligned_output.list 6) Creating training lists for each CPU by dividing the list in N ways and moving them to the corresponding CPU directories: isip105/data_generation/lists/train_mfc.list isip105/data_generation/lists/aligned_output.list 7) Creating transcriptions for each CPU by dividing them in N ways and moving them to the corresponding CPU directories: - Create monophone transcriptions without "sp" between words: isip105/data_generation/transcriptions/mono_trans_no_sp.text - Create monophone transcriptions with "sp" between words: isip105/data_generation/transcriptions/mono_trans_with_sp.text 8) Checking endian-ness of your system: Check the architecture of the machine and print it on the stdout. This is important to debug the big-endian versus little-endian format features. B.2 FLAT-START 1) Building initial base monophone models: a) Create monophone models: COMMAND: /isip/tools/proto/bin/scripts/create_models -states \ train/baum_welch/monophone/base/r00/hmm0/model_lengths.text \ -output train/baum_welch/monophone/base/r00/hmm0/ \ models.text" For additional information type "create_models -help". b) Create create the phone mapping file: COMMAND: /isip/tools/proto/bin/scripts/create_triphone_map -mono \ train/baum_welch/monophone/base/r00/hmm0/monophones.text \ -clist train/baum_welch/monophone/base/r00/hmm0/ \ monophones.text -context ci -models train/baum_welch/ \ monophone/base/r00/hmm0/models.text -output train/ \ baum_welch/monophone/base/r00/hmm0/phones.text For additional information type "create_triphone -help". c) Initialize the monophone models: COMMAND: rsh isip105 /isip/tools/proto/bin/i386-pc-solaris2.7/ init_hmm \ -input train/data_generation/lists/train_mfc.list \ -models train/baum_welch/monophone/base/r00/hmm0/ \ model_lengths.text -trans train/baum_welch/monophone/base/ \ r00/hmm0/transitions.text -state train/baum_welch/monophone/ \ base/r00/hmm0/states.text -mode binary -vfloor_file train/ \ baum_welch/monophone/base/r00/hmm0/vfloor.text -var_floor 0.0002 \ -num_features 39 For additional information type "init_hmm -help". c) Convert the states from text to binary format: COMMAND: rsh isip105 /isip/tools/proto/bin/i386-pc-solaris2.7/ \ convert_mmf -input_mode ascii -output_mode binary train/ \ baum_welch/monophone/base/r00/hmm0/states.text train/ \ baum_welch/monophone/base/r00/hmm0/states.bin For additional information type "convert_mmf -help". B.3 LONG SILENCE MODEL TRAINING (SIL TRAINING) 1) Four passes of single-mixture training without the sp model: a) Pass 1: -Generating accumulators on each of the CPU: COMMAND: sh isip105 isip105_0/train/baum_welch/ \ monophone/base/r00/hmm0/run_bw_train.sh & Look into the shell script "run_bw_train.sh" to observe the inputs and the outputs. -Combining accumulators generated from the previous step: COMMAND: rsh isip105 /isip/tools/proto/bin/i386-pc-solaris2.7/ \ bw_train -p train/baum_welch/monophone/base/r00/hmm0/ \ params.text Look into the "params.text" to observe the inputs and the outputs. For additional information type "bw_train -help". ... ... d) Pass 4: B.4 SHORT SILENCE TRAINING (SP TRAINING) 1) Initializations for sil model training: a) Tying sp model to central state of silence model and adding skip states to silence model: This is done though the subroutine function "add_sp_and_sil_trans". Inputs: train/baum_welch/monophone/base/r00/hmm0/models.text train/baum_welch/monophone/base/r00/hmm4/transitions.text Outputs: train/baum_welch/monophone/base/r01/hmm0/models.text train/baum_welch/monophone/base/r01/hmm0/transitions.text b) Create create the phone mapping file for the new models: COMMAND: /isip/tools/proto/bin/scripts/create_triphone_map -mono \ train/baum_welch/monophone/base/r00/hmm0/monophones.text \ -clist train/baum_welch/monophone/base/r00/hmm0/ \ monophones.text -context ci -models train/baum_welch/ \ monophone/base/r01/hmm0/models.text -output train/ \ baum_welch/monophone/base/r01/hmm0/phones.text For additional information type "create_triphone -help". 2) Four passes of single-mixture training with the sp model: a) Pass 1: -Generating accumulators on each of the CPU: COMMAND: sh isip105 isip105_0/train/baum_welch/monophone/ \ base/r01/hmm0/run_bw_train.sh & Look into the shell script "run_bw_train.sh" to observe the inputs and the outputs. -Combining accumulators generated from the previous step: COMMAND: rsh isip105 /isip/tools/proto/bin/i386-pc-solaris2.7/ \ bw_train -p train/baum_welch/monophone/base/r00/hmm1/ \ params.text Look into the "params.text" to observe the inputs and the outputs. For additional information type "bw_train -help". ... ... d) Pass 4: B.5 FORCED ALIGNMENT 1) Create the phonetic transcriptions from the word transcriptions, lexicon and the trained monophone models. COMMAND: rsh isip105 /isip/tools/proto/bin/i386-pc-solaris2.7/ \ trace_projector -p train/baum_welch/monophone/base/ \ alignments/params.text Look into the "params.text" to observe the inputs and the outputs. For additional information type "trace_projector -help". 2) Divide these transcriptions in N ways according to the number of CPU's and move them to the corresponding CPU directories. Also move the corresponding feature files list. isip105_0/train/baum_welch/monophone/base/transcriptions/ aligned_trans.text isip105_0/train/baum_welch/monophone/base/transcriptions/ aligned_mfcc.text B.6 MONOPHONE TRAINING 1) Five passes of single-mixture monophone training: a) Pass 1: -Generating accumulators on each of the CPU: COMMAND: sh isip105 isip105_0/train/baum_welch/monophone/ \ base/r01/hmm4/run_bw_train.sh & Look into the shell script "run_bw_train.sh" to observe the inputs and the outputs. -Combining accumulators generated from the previous step: COMMAND: rsh isip105 /isip/tools/proto/bin/i386-pc-solaris2.7/ \ bw_train -p train/baum_welch/monophone/base/r01/hmm4/ params.text Look into the "params.text" to observe the inputs and the outputs. For additional information type "bw_train -help". ... ... d) Pass 5: 2) Copy final models to train/baum_welch/monophone/mixture/final_models train/baum_welch/monophone/mixture/final_models/lexicon.text train/baum_welch/monophone/mixture/final_models/models.text train/baum_welch/monophone/mixture/final_models/monophones.text train/baum_welch/monophone/mixture/final_models/phones.text train/baum_welch/monophone/mixture/final_models/states.bin train/baum_welch/monophone/mixture/final_models/transitions.text train/baum_welch/monophone/mixture/final_models/vfloor.text =============================================================================== C. EVALUATION OVERVIEW Evaluation can be run from the same workspace as training. Let's assume we are still working from exp_001. All the paths and directories for decoding will be created automatically relative to the "exp_001" directory. Decoding can be run using this command: wsj_run -test_mfc_list ./devtest_255_v1.0_mfc.list \ -cpus_test isip105 -models_path ./ | tee decode.log This will run decoding to its completion and create the necessary output hypothesis. Note that the filename lists must correctly identify the location of the data ON YOUR MACHINE, and the pathname to the file list might vary depending on your installation. The result of the above command is a set of hypotheses and the report file. These are located at: decode/baum_welch/monophone/mixture/grammar_decoding/output/ decode/baum_welch/monophone/mixture/grammar_decoding/hypothesis.report Below, we will describe all steps leading to this result. The log file, decode.log, will contain a step-by-step status report of decoding as it progresses. C.1 INITIALIZATION 1) Checking for input data directory and files: checks whether all needed data exists (/home/users/.../asr_va_tutorial_v1.0/data) 2) All testing mfc files exist : checks whether all the input feature files specified from the command line exists. 3) Creating local data directories: - data_generation - decode - isip105 4) Creating testing lists - data_generation/lists/test_mfc.list 5) Creating testing lists for each CPU by dividing the list in N ways and moving them to the corresponding CPU directories: isip105/data_generation/lists/test_mfc.list 6) Creating transcriptions for each CPU by dividing them in N ways and moving them to the corresponding CPU directories: - Create monophone transcriptions without "sp" between words: isip105/data_generation/transcriptions/mono_trans_no_sp.text - Create monophone transcriptions with "sp" between words: isip105/data_generation/transcriptions/mono_trans_with_sp.text 7) Checking endian-ness of your system: Check the architecture of the machine and print it on the stdout. This is important to debug the big-endian versus little-endian format features. C.2 GRAMMAR INITIALIZATION 1) Building the lattice from the grammar file: COMMAND: rsh isip105 /isip/tools/releases/proto/isip_proto_v5_12_t00/ \ bin/i386-pc-solaris2.7/grammar_compiler \ -input asr_va_tutorial_v1.1/ \ data/decode/grammar.text -output decode/baum_welch/monophone/ \ mixture/grammar_decoding/grammar.lat For additional information type "grammar_compiler -help". 2) Building the input lattice list: decode/baum_welch/monophone/mixture/grammar_decoding/lists/ \ input_lattice.list 3) Dividing the lattice list N ways according to the number of CPU's and moving these to the corresponding directories. isip105_0/decode/baum_welch/monophone/mixture/ \ grammar_decoding/lists/input_lattice.list 4) Building the output hypotheses list: decode/baum_welch/monophone/mixture/grammar_decoding/lists/ \ output.list 5) Dividing the output list N ways according to the number of CPU's and moving these to the corresponding directories. isip105_0/decode/baum_welch/monophone/mixture/ \ grammar_decoding/lists/output.list C.3 NETWORK DECODING Network decoding is accomplished by running the recognizer in a mode known as "lattice rescoring": COMMAND: sh isip105 isip105_0/decode/baum_welch/monophone/ \ mixture/grammar_decoding/run_trace_projector.sh & Look into the shell script "run_trace_projector.sh" to observe the inputs and the outputs. C.4 SCORING COMMAND: /isip/tools/releases/proto/isip_proto_v5_12_t00/bin/scripts/ \ isip_eval isip_model /decode/baum_welch/monophone/mixture/ \ grammar_decoding/lists/output.list asr_va_tutorial_v1.1/data/decode/ \ reference.score decode/baum_welch/monophone/mixture/ \ grammar_decoding/hypothesis The output report is at: decode/baum_welch/monophone/mixture/grammar_decoding/hypothesis.report =============================================================================== D: APPENDIX Set your current working directory to this place: cd /home/users/.../asr_va_tutorial_v1.0 This appendix provides overview of the source code, and the various files needed for training the monophone models and network grammar decoding. All these files correspond to the Creare Phase 1 data. D.1 COMMAND LINE PARSING: scripts/perl/command_line/command_line.pm All utilities use the same command line interface, written in perl. The perl code is located in the module command_line.pm. D.2 SUBROUTINES: scripts/perl/wsj_subroutines/wsj_subs.pm.in All the utilities use the subroutines from the subroutine file, written in perl. The perl code is located in the module wsj_subs.pm.in. D.3 UTILITIES: util/wsj_scripts/wsj_*.pm.in;/util/check_endian/check_endian.cc All the driver utilities are written in perl. The C++ code to check the endianness of the native architecture is located in module check_endian.cc. D.4 CONFIGURATION: data All overview of the files needed to train the monophone models from scratch and decode is provided in this section. A) Training setup: data/train i) Monophone training setup: data/train/monophone a) Single mixture base: data/train/monophone/base - Monophones listing: data/train/monophone/base/monophones.text - Monophones topology: data/train/monophone/base/model_lengths.text - Special models: data/train/monophone/base/special_models.text - Monophone transcriptions without "sp" between words corresponding to the entire database: data/train/monophone/base/all_mono_trans_no_sp.text - Monophone transcriptions with "sp" between words corresponding to the entire database: data/train/monophone/base/all_mono_trans_with_sp.text - Word transcriptions corresponding to the entire database: data/train/monophone/base/all_word_transcription.text - Lexicon: data/train/monophone/base/lexicon.text b) Multiple mixture: data/train/monophone/mixture - Special models: data/train/monophone/mixture/special_models.text B) Decoding setup : /data/decode - Lexicon: /data/decode/decode_lexicon.text - Grammar: /data/decode/grammar.text - Reference transcriptions corresponding to the decoding database: /data/decode/reference.text ===============================================================================