• 41.8% WER at 60xRT on the dev-test from JHU WS'97: competitive with comparable commercial systems

  • Front-End: Mel Cepstral (12) + Energy + Delta + Acceleration 60+ hrs training, cepstral mean normalization

  • Models: 3-state left-to-right HMMs with dedicated silence and inter-word silence models, 40 phones, cross-word context-dependent triphones

  • Training schedule: Baum-Welch training
    • Flat-start
    • Monophone Training
    • Triphone creation
    • State-tying
    • Mixture Generation

  • Decoding: time-synchronous decoding
    • trigram, word-internal lattice generation
    • trigram, cross-word lattice rescoring