Front-End Presentation Outline * fe_main_00: Abstract - Title * Introduction (no foil) Speech Recognition is just a pattern matching problem. ** fe_intro_00: Draw a picture of two waveforms for the same word, caption "is this the same waveform?" ... "BAD QUESTION" Front end translates acoustic data into observation vectors in the probability space. ** fe_intro_01: Draw another picture of two observation vectors -> distance equation -> probability -> a hard number. ... "BETTER QUESTION" ** fe_intro_02: Project objectives hit on public domain code implementation of all state-of-the-art algorithms interface to the ISIP decoder (maybe not) expendable structure, tutorial-style code, good documentation, demo as a teaching tool * fe_algo_00: System Overview nifty block diagram ** fe_algo_01: Filter Banks average spectral magnitude within the filter channel draw the spectrum -> filter bank histogram looking thing. have filter banks spaced on the mel scale in the plot, say that this is to more closely model the human auditory perceptual model (logarithmic over 800 Hz) ** fe_algo_02: cepstral minimum phase representation, model more robust to noise. Liftering (maybe) show equation. Say these are the state-of-the-art in most modern systems. Try to find two utterances with the same content but varying noise and show cepstrum vs. fba for both. ** fe_algo_03: LP vs. FFT LP spectrum. Show spectrum with different LP orders vs. FFT spectrum. LP is faster than FFT, but less of an issue now. problem with LP model is that it approximates all frequencies equally, inconsistent with human perception ** fe_algo_04: PLP picture of the human vocal tract & ear. (Better resolution, of course). New method which attempts to solves LP's biggest problem by spacing LP coefficients non-linearly over the signal to more closely match human perception ** fe_algo_05: Delta features first and second time derivatives of the signal, regression method used, increases accuracy by 5% on SWB * fe_eval_00: Evaluation Design frame level classification experiments on subset of SWB and Alpha-Digit corpora. Need some nice picture of frame comparison (somehow) ** fe_eval_01: support vector machines. Maybe a nice classification problem plot and how SVMs can solve the problem when other methods (PCA, LDA, etc) can't. I can borrow Suresh's PCA vs. LDA slide :) Aravind may have a slide I can steal for this. If not, I can collaborate with him for a foil he might want for ICSLP. This slide may be outside the scope of this presentation. I just like the idea of talking about SVM's because they are new. ** fe_eval_02: Corpus Description (Alphadigits) ** fe_eval_03: Results 0 We will likely have more than one results foil ** fe_eval_04: Results 1 I'd like to do some sort of ROVER analysis too see if the different algorithms extract different information. This could make a nice auxiliary slide. * fe_eval_05: Analysis * fe_conc_00: Conclusions * fe_ref_00: References * Auxiliary information These slides are not part of the presentation, but will be available in case of the most probably questions. ** fe_aux_00: differences between us and HTK cepstral ** fe_aux_01: Why we didn't just evaluate by running recognition