DARPA Communicator Recognition Server v2.0:
This release features some enhancements, including: n-best
hypothesis list support, capacity of receiving "GAL_BINARY"
format audio data, ability to compile on both Sparc and Pentium
machines. In order to run this server, one must install the
v5.11 or greater of our prototype system. Detailed installation and verification instructions are available on-line.
Multiple-CPU Eval Package (v1.0):
This package allows users to easily run complete experiments
using multiple cpus. In addition to the parameters available
in the single-cpu scripts, the user can specify
which computers are to be used for training and testing.
Please be sure that you have already installed the ISIP prototype
system (v5.11) before running this application.
To install this package, follow the instructions below.
Detailed instructions are included in the release's AAREADME.text file.
- tar xzvf asr_tutorial_v1.0.tar.gz
- cd asr_wsj_tutorial_v1.0
- source <install directory for v5.11>ISIP_ENV.sh
- ./configure --prefix=.
- make install
- source ISIP_WSJ_ENV.sh
- wsj_run -help
A typical command line for decoding will look like this:
wsj_run -num_features 39 -feature_format isip_proto
-test_state_beam_pruning 250 -test_model_beam_pruning 200
-test_raw_list ./swb_raw.list -models_path data/final_models
-cpus_test isip210 isip211 isip212 isip212
Or you can easily run decoding using a specified list of raw
files with our embedded Switchboard models (12 mixture word-internal
triphone models, bigram language model):
wsj_run -test_raw_list swb_raw.list -cpus_test isip210 isip211 isip212 isip212
The option "-cpus_test" instructs the utility
which computers are to be used for decoding. If a cpu has multiple
processors, you can repeat its name N times for N processors. In order to score the hypotheses successfully, the reference file data/decode/reference.score needs to be modified. Regardless of the success of scoring, a hypothesis list will be output to decode/baum_welch/wint_tri/bigram_decoding/hypothesis.score. This hypothesis list corresponds to the raw file list.
DARPA Communicator Recognition Server:
This server provides speech recognition functionality
A simple demo is included that allows you to decode
audio data and display the resulting hypothesis in a text window.
The core recognition technology features a
real-time Resource Management system.
In order to run this server, one must install the
v5.9 or greater
of our prototype system.
installation and verification instructions
are available on-line.
Real-Time Resource Management System:
This system is derived from our baseline system, but uses
a 15 ms frame duration, and achieves a WER of 5.0%.
It runs 1.1 xRT on a 600 MHz Pentium Processor. It is described
in more detail in a
summarizing our progress in the first semester of this project.
State-Of-The-Art Baseline Resource Management System:
The DARPA Resource Management (RM) task has a vocabulary on the order
of 1000 words and a perplexity less than 60. The acoustic front end
for this system uses our standard MFCC front end with 10 ms frames.
The final models are Gaussian crossword triphone models that
use six mixtures per state. This system achieves
a WER of 3.4% running at 8.3 xRT on a 600 MHz Pentium Processor.
More details on our baseline experiments can be found in a
describing our baseline system.