4.3.3 Scoring: Generating Scoring Reports
Before describing the conversion of data to the NIST format, let's briefly discuss the format of the hypotheses produced by our recognizer. The most fundamental and useful output format from a speech recognition system is a tree-like data structure, know as an annotation graph, that contains a time-aligned transcription of the utterance. All levels of information in the hierarchical system that have been used to model this utterance are encapsulated in this graph. An example of such an output is shown to the right. The particular annotation graph format used in our tools is based on a toolkit developed by the Linguistic Data Consortium (LDC). Within the IFCs, you will find a class called AnnotationGraph that represents our implementation of this data structure. Annotation graphs (AGs) can be used to store either hypotheses or reference transcriptions. In addition to storing labels of the words spoken or hypothesized, annotation graphs allow storing multiple layers of knowledge about each word, such as parts of speech or acoustic labels. Any of the information can be time-aligned, but this is not required. For more details about annotation graphs, see our monthly tutorial archive. To continue with this tutorial, you will need access to a results file from a previous recognition experiment. For this example, let's use the results from the experiment in Section 4.2.5, results.out. There are three steps required to score these results:
|