2.4.2 Auxiliary Resources:
Audio and Transcription Databases One of the more time-consuming aspects of speech recognition research is preparation and coordination of speech audio data and speech transcriptions. Often, experiments are aborted because the list of audio files does not match the list of transcriptions. Unless these two are tied together in some way, it is difficult to avoid such problems. Therefore, in our system, we provide a unique method for storing and accessing speech data and transcriptions through two related database representations, AudioDatabase and TranscriptionDatabase. These databases are created and manipulated using a single tool called isip_make_db. AudioDatabase Storage and access to speech data files is managed through an internally defined database format, AudioDatabase. This database manages a set of records. A record typically contains 1) a unique identifier, which we refer to as the id, and 2) the location of the speech file on disk. To obtain a record from the audio database, the id must be referenced. Consider a collection of three files: ae_12a.sof, ae_1a.sof, and ae_2789385a.sof. We need to arrange these in a single file, called a list file, with corresponding ids. An example of such a file is audio_list.text. Go to the directory:
The second option, "-audio", provides the name of the listing file. This listing file typically contains a filename followed by a key. You can create these fairly easily using Unix commands such as "ls" and a programmable editor such as "emacs". The key is optional, in which case a unique key will be generated automatically. An example of a listing file is audio_list.text. This file contains the three filenames mentioned above and the corresponding ids (based on the file's basename in this example). The third option, "-name", should be set to the name of the data. The fourth option, "-type", is used to generate either a text or binary Sof file. In this case we use "text" so we can view the output file by simply listing it. The last entry, which is the first argument, is the name of the output file which will contain an audio database. See audio_db.sof for the output from the example given above. The database file contains four Sof objects: an AudioDatabase object, and three Filename objects which contain the names of the filenames included in this example. The AudioDatabse object encapsulates the database name (e.g., TIDigits), a list of ids, a mapping from ids to Filename object numbers. The ids link filenames to transcriptions described below. Since the audio files are often located in a location different from the current working directory, it is useful to make these databases using filenames that contain work from any directory. The obvious way to do this is to use a fully qualified filename. For example, "ae_12a.sof" could be represented as "/isi./data/corpora/tidigits/ae_12a/sof". Another convenient way to do this is to use an environment variable. For example, the file named "ae_12a.sof" can be represented as "$TUTORIAL/ae_12a.sof" in the file audio_list.text. If the environment variable "$TUTORIAL" is properly set to "/isi./data/corpora/tidigits", then this file will be accessible from any location. The advantage of an environment variable is that the database can be moved to a new location and the only thing that needs to be updated is the environment variable. Transcription Database Transcriptions for the speech files in an audio database are managed by a TranscriptionDatabase. This database uses annotation graphs to represent the transcriptions, which typically consist of strings of words (though they can be much more complicated than that). The transcriptions are organized using the same key value used in the audio database. To obtain a transcription of a particular speech file in an audio database, the key for that particular data file must be referenced. Continuing on the example described above, we can create a transcription list file many different ways using standard Unix commands and editors. For applications such as TIDigits, this is particularly simple because the transcriptions are encoded in the filename. An example of a transcription list file is provided in trans_list.text. This file contains fields of the form:
The command to create a transcription database file from this data is:
The result transcription database can be viewed in trans_db.sof. This file contains a TranscriptionDatabase object and three AnnotationGraph objects. The latter contain the actual transcription along with the timing information. The former contains the ids used to reference individual AnnotationGraphs. The format of this object is the same as described above in audio_db.sof. Note that both of these databases could have been built using a single command:
-level word -name TIDigits -type text audio_db.sof trans_db.sof The beauty of our database approach to handling file lists is that important subsets of a database are now simply referenced using lists of ids. In this way, we avoid the problem of mismatches between audio files and transcriptions. The audio and transcription databases are created once for the entire database, and users simply need to operate on the appropriate lists of ids. Common problems such as a missing transcription or an incorrect ordering of files, which cause mismatches between simpling listing files, are alleviated because there is just one file, a list of ids, that needs to be maintained. For a more detailed explanation of isip_make_db, see our on-line documentation. |