README : Segmented Lattices for 2 switchboard utterances

1. Contents
================
Two sub-directories each containing both original and segmented lattices for 
an utterance
a) sw_21545_A_x_14008_14505
b) sw_20017_A_x_26031_27334

Symbols File for the speech recognition vocabulary 
c) wordmap.master


2. How to Look at the lattices
===============================

Example : In directory sw_21545_A_x_14008_14505
The Lattice Produced by the decoder sw_21545_A_x_14008_14505.original.fsm
has been cut into a number of sub-lattices. Each sub-lattice segment 
has been tagged as either a low or high confidence region.
Sub-Lattices : sw_21545_A_x_14008_14505.{low,high}.fsm

A low confidence region is a lattice by itself. Each path in a low
confidence region begins with a !sent_start and ends with a !sent_end symbol.  
A high-confidence region is a unique sequence of words .  
Note !NULL is a empty word (epsilon Link) in a fsm

The original lattices and the lattice segments are provided in 
a) ATT Finite-State toolkit format.Binary files (.fsm format) 

These binary format files can be printed or drawn into figures
To print them :
fsmprint -i wordmap.master fsm_file
To draw them as eps figures
fsmdraw -i wordmap.master fsm_file | dot -Tps > fsm_file.eps

Note that : fsmtools and dot can be downloaded from
http://www.research.att.com/sw/tools/fsm
http://www.research.att.com/sw/tools/graphviz


b) HTK Standard Lattice Format : Text files (.slf files)
These are plain text files. The documentation of this 
format is given in HTK V3.0 book 

HTK V3.0 can be downloaded from http://htk.eng.cam.ac.uk/
