41.8% WER at 60xRT on the dev-test from JHU WS'97: competitive with comparable commercial systems
Front-End: Mel Cepstral (12) + Energy + Delta + Acceleration 60+ hrs training, cepstral mean normalization
Models: 3-state left-to-right HMMs with dedicated silence and inter-word silence models, 40 phones, cross-word context-dependent triphones
Training schedule: Baum-Welch training
- Flat-start
- Monophone Training
- Triphone creation
- State-tying
- Mixture Generation
Decoding: time-synchronous decoding
- trigram, word-internal lattice generation
- trigram, cross-word lattice rescoring