The signal is converted to a sequence of
   feature vectors based on spectral and
   temporal measurements.
    Acoustic models represent sub-word
   units, such as phonemes, as a finite-
   state machine in which states model
   spectral structure and transitions
   model temporal structure.
    The language model predicts the next
      set of words, and controls which models
   are hypothesized.
    Search is crucial to the system, since
      many combinations of words must be
     investigated to find the most probable
      word sequence.