=========================================================================== PRONUNCIATIONS DICTIONARY FOR PROPER NOUNS ****************************************** This directory contains the surname-pronunciations database as described below. The required format of the phonetic transcriptions is name_spelling phonetic_transcription =========================================================================== file: pronunciation_dictionaries.tar.gz This file contains the full data set, with 18494 surnames and 24040 pronunciations. The training set consists of 15000 names, and the test set is the remaining 3494 names. There are three such training and test sets. They have been divided such that each of the three test sets are disjoint. =========================================================================== file: names_128.text This file contains 128 words,each of which are four letters long, that we often use for preliminary experiments with neural networks as well as decision trees. =========================================================================== file: names_4test.text This file contains 408 words used for preliminary experiments with neural networks as well as decision trees in the testing mode. =========================================================================== file: names_4train.text This file contains 1611 words used for preliminary experiments with neural networks as well as decision trees in the training mode. =========================================================================== file: proper_nouns.tar.gz A list of proper nouns --- contains geographic names as well as surnames of people. =========================================================================== file: align.tar.gz Simple Viterbi source code to insert blank phones "_" in the appropriate places in the pronunciations to align them with the name spelling. =========================================================================== HELP / BUG REPORTS ****************** Please email to help@isip.msstate.edu for help or further information regarding this database. ===========================================================================