SYLLABUS

Contact Information:

  Lecture     TBD  
  TBD  
  Lecturer     Joseph Picone, Professor  
  Office: EA 703A  
  Office Hours: (MWF) 11:00 - 12:00 PM  
  Phone: 215-204-4841  
  Email: picone@temple.edu  
  Skype: joseph.picone  
  Social Media     temple.engineering.ece8525@groups.facebook.com  
  Website     http://www.isip.piconepress.edu/courses/temple/ece_8525  
  Required Textbook     None  
  Reference Textbooks     Joseph Picone
  Signal Processing in Speech Recognition
  Publisher and ISBN: TBD.
  URL: http://www.isip.piconepress.com/publications/books/2013/sp_asr  

  X. Huang, A. Acero, and H.W. Hon
  Spoken Language Processing -
  A Guide to Theory, Algorithm, and System Development

  Prentice Hall, ISBN: 0-13-022616-5, 2001.

  F. Jelinek
  Statistical Methods for Speech Recognition
  MIT Press, ISBN: 0-262-10066-5, 1998.

  J. Deller, et. al.
  Discrete-Time Processing of Speech Signals
  MacMillan Publishing Co., ISBN: 0-7803-5386-2, 2000.

  S. Pinker,   The Language Instinct: How the Mind Creates Language
  Harperperennial Library, ISBN: 0-0609-5833-2, 2000.

  D. Jurafsky and J.H. Martin
  SPEECH and LANGUAGE PROCESSING:
  An Introduction to Natural Language Processing,
  Computational Linguistics, and Speech Recognition
,
  Prentice-Hall, ISBN: 0-13-095069-6, 2000.

  L.R. Rabiner and B.W. Juang
  Fundamentals of Speech Recognition
  Prentice-Hall, ISBN: 0-13-015157-2, 1993.
  Prerequisites     ENGR 5022 (minimum grade: B-)
  ENGR 5033 (minimum grade: B-)  


Grading Policies:

  Item  
  Weight  
  Exam No. 1     20%  
  Exam No. 2     20%  
  Exam No. 3     20%  
  Final Exam     20%  
  Project     20%  
  TOTAL:     100%  


This course introduces students to the theory and implementation of modern day speech recognition systems. We begin with a review of pattern recognition and machine learning, including topics such as Gaussian mixture models and Bayesian models. We then discuss the three main components of a speech recognition system: feature extraction, acoustic modeling and language modeling. We conclude the course with an overview of state of the art systems. Students will learn how to simulate and evaluate complex machine learning algorithms such as hidden Markov models and neural networks. Data-driven methodologies will be emphasized.

The course requirements include three in-class exams and a final exam. In addition, students will be expected to complete a course project that involves learning how to use an existing, state of the art speech recognition system on a comprehensive large vocabulary speech recognition task. In this project, students will establish baseline performance using a standard system. They will then be expected to perform an exploration of some aspect of this system with a goal of improving overall performance. The specifics of this assignment will be negotiated in writing with the course instructor, and fully defined by the fifth week of the course.

Lecture Schedule:

The lecture component will cover the following topics:

  Class  
  Topic(s)  
  1     (a) Course Overview and Introduction  
  (b) Speech Physiology  
  (c) Speech Production Models  
  2     (a) Hearing Physiology  
  (b) Phonetics and Phonology  
  (c) Syntax and Semantics  
  3     (a) Sampling and Resampling  
  (b) Transduction  
  (c) Temporal Analysis  
  4     (a) Frequency Domain Analysis  
  (b) Cepstral and Linear Prediction Analysis  
  (c) Spectral Normalization  
  5     (a) Differentiation  
  (b) Noise Reduction and iVectors  
  (c) Exam No. 1  
  6     (a) Dynamic Programming  
  (b) Fundamentals of Markov Models  
  (c) Parameter Estimation  
  7     (a) HMM Training  
  (b) Continous Mixture Distributions  
  (c) Practical Issues  
  8     (a) Decision Trees  
  (b) Limitations of HMMs  
  (c) Deep Learning  
  9     (a) Formal Language Theory  
  (b) Context Free Grammars and N-Grams  
  (c) Exam No. 2  
  10     (a) Smoothing  
  (b) Efficient Lexical Trees  
  (c) Adaptation  
  11     (a) Discriminative Training  
  (b) Hybrid Systems  
  (c) Evaluation Metrics  
  12     (a) Bayesian Networks  
  (b) Nonparametric Bayesian Approaches  
  (c) Deep Belief Networks  
  13     (a) Overview of State of the Art Systems  
  (b) Contemporary Challenge Tasks  
  (c) Exam No. 3  
  14     (a) Applications: Language Identification  
  (b) Applications: Speech to Speech Translation  
  (c) Applications: Multimodal Systems  
  15     (a) Final Exam  


Please note that the dates above are fixed since they have been arranged to optimize a number of constraints. You need to adjust your schedules, including job interviews and site visits, accordingly.