Activities and findings: ------------------------ Research and Education Activities: - Analysis of spectral cues to prosodic structure (actually, this should have been in last year's report but wasn't) Used prosodically labeled data as a feature in HMM decision tree clustering to investigate whether prosody might be useful in acoustic modeling and whether spectral cues might be useful for prosody recognition. Preliminary results suggest that spectral cues are useful for discriminating between fluent and disfluent pauses. - Analysis of the interaction between prosodic and syntactic structure: Investigated different aspects of parse structure to determine what features are the best predictors of prosodic structure, with the assumption that these would be the best targets for using prosody to improve parsing. The depth of the left constituent is the most important feature of those investigated. - Recognizing prosodic structure: We investigated recognition of prosodic structure given acoustic cues and/or syntactic cues using known word transcriptions and simple decision tree classifiers. For a 4-class recognition problem, we achieved 79% correct with prosodic cues alone, and 89% correct when parse features are added. Since very high results are obtained with the combined features, and there is much more syntactically annotated data than prosodically labeled data, we are currently investigating training prosody recognition modules using only partially labeled data. - Recognizing sentence boundaries and disfluency interruption points: We have begun experiments in recognizing sentence boundaries, incomplete sentences and disfluency interruption points using prosodic and word class (POS) cues. So far, we have good results for detecting sentences, but interruptions and incomplete sentences are much less frequent and hence not well modeled. One problem is that there are word boundaries that could (theoretically and because of acoustic correlates) be marked as interruption points (e.g. before a filled pause) which were not marked with the current labeling convention. The next step is to assess performance with a revised disfluency labeling system. Findings: (included with each bullet above) Training and Development: - Two graduate research students at UW have been trained in pattern recognition and in speech and language technology, specifically automatic recognition of prosodic structure and in analysis or prosody/parse structure relations. One other student worked with the basic infrastructure developed in this effort on a directed study project. Contributions: -------------- Contributions within Discipline: - Developed a framework for using partially labeled data to train prosodic models and reduce hand labeling costs. Contributions to Education and Human Resources: - Three graduate students at UW have been trained in speech and language technology. Contributions to Resources for Science and Technology: - Developed a prosodically labeled corpus of conversational speech.