Robert J. Moorhead Image Processing ERC Introducing the INSTITUTE FOR SIGNAL AND INFORMATION PROCESSING located at Mississippi State University Department of Electrical and Computer Engineering Box 9571, Mississippi State, Mississippi 39762 Tel: 601-325-3149 Fax: 601-325-3149 email: picone@isip.msstate.edu MISSION STATEMENT Mississippi State University for over 100 years has had a mission of being a center of excellence in the State of Mississippi for: · Learning - to enhance the intellectual development of its students · Research - to extend the present limits of knowledge · Service - to apply its research to improve the lives of people The Institute for Signal and Information Processing (ISIP) offers a multidisciplinary program focused on the development of next generation information processing techniques. Research at ISIP is centered on intelligent information processing, perhaps the most important technology of the next century. ISIP draws upon a wide range of research experience in areas such as signal processing, communications, natural language, database query, intelligent systems, and discrete controls. Its present vision is to develop systems capable of intelligent interactions with users by the integration of a multiplicity of interface technologies including speech, natural language, database query, and imaging. isip00 (fileserver, router, and domain server): · Sun SPARC 5 · 70 MHz MicroSPARC II · 32 Mbytes RAM, 1 Gbyte local disk · 2 ethernets (for routing) · 60 Gbytes magnetic disk (Seagate Elite) Exabyte 10h Tape Library · 8 mm tapes · 70 Gbyte capacity · 140 Gbytes compressed Outside World (hub #0): · Allied Telesyn MR 820T · 10BaseT 8 port hub (10 Mbits/sec) · Cat-5 Unshielded Twisted Pair · 155 Mbits/sec ATM (campus) isip01 (compute server): · Sun SPARC 20-512 · Two 50 MHz SuperSPARC Processors · 192 Mbytes RAM, 1 Gbyte local disk isip02 (demo machine): · Sun Sparc 5 · 70 MHz MicroSPARC II · 32 Mbytes RAM, 1 Gbyte local disk · T1 Telecom Interface Basic TECHNOLOGY: A PATTERN RECOGNITION PARADIGM BASED ON HIDDEN MARKOV MODELS datlink 0 and datlink 1 (audio): · Townshend DAT-Link+ · 16-bit digital audio · AES/EBU and SP-DIF Sharp JX-325 Color Scanner: · one-pass 24-bit color scan · 300 dpi native mode · Detailed performance analysis in a common framework Algorithm FFT ORDER 16 64 256 1024 4096 16384 RAD2 20 60 280 1960 10900 97100 RAD4 20 60 250 1800 9720 58220 SRFFT 20 40 160 1060 6140 38100 FHT 20 40 140 640 3800 38100 QFT 20 40 160 880 6560 44020 DITF 20 60 360 2500 12320 104080 (Table entries are computation times in usec) PARALLEL IMPLEMENTATIONS OF FAST FOURIER TRANSFORMS · Object-oriented software implemented in C++ A T1-BASED DATA COLLECTION SYSTEM For SUN/UNIX WORKSTATIONS The JEIDA Japanese Common Speech Data Corpus Domain: isip.msstate.edu Automatic Generation of N-Best Proper Noun Pronunciations What Differentiates ISIP Research? p Public Domain Software p Extensive Web Archive p Object-Oriented Signal Processing Software p State-of-the-Art Performance Tasks p Close Industrial Ties p Next-Generation Statistical Models Based on Chaotic Systems Applicable to acoustic and language modeling Addresses a fundamental barrier in speech understanding Anthony Skjellum High Performance Computing Computer Science / ERC Joe Picone Signal Processing Inst. for Signal and Info. Proc. Stephen E. Saddow Semiconductor Technology EMRL Signal Processing Research At MISSISSIPPI STATE UNIVERSITY Is Multidisciplinary isip03 and isip05 (compute server): · dual Pentium Pro · 200 MHz Processor · 256 Mbytes RAM, 1Gbyte local disk isip04 and isip06 (laptops): · Samsung Sens 810, Toshiba Tecra 500 CDT · 133 MHz Pentium Processor · 40 Mbytes RAM, 2 Gbyte local disk ncd20c00 (clients): · NCD Xterms · 16-bit audio SYLLABLE-BASED SPEECH RECOGNITION FOR CONVERSATIONAL TELEPHONE SPEECH ISIP's Focal Project · An Integrated Services Transactions Processor That Supports Advanced Telecommunications Interfaces such as an Asynchronous Transfer Mode (ATM) Digital Communications Link Example: Telephone-Based Natural Language Query of Entertainment Archives Customer: "Give me all movies, uh, make that only the recent movies, directed by Martin Scorsese and starring Robert DeNiro, and oh, by the way, make that movies about gangsters only." Computer: We have three titles available (the titles of the movies are shown on the television screen with real-time video of promo clips from each movie below the title). Please select a movie. Customer: "That one with the three guys looks good, I'll take that one. I want it to start at 8:00 PM tomorrow." Computer: (The promo clip for the selected movie starts playing on the television.) The movie titled GoodFellas starring Robert DeNiro and directed by Martin Scorsese will be delivered for viewing on your television on Thursday, September 25 starting at 8:00 PM. Thank you for using ISIP's Entertainment Server. Good-bye. Local Central Office ATM (160 Mbps) · Voice · Video · Data (X Windows) Unix Multiprocessor (Sparcstation 2000): · 8 Processors · 512 Mbytes of memory · videotape jukebox Search Algorithms: Pattern Matching: Signal Model: Recognized Symbols: Language Model: Algorithms Aravind Ganapathiraju (Ph.D. - 1) Jule Baca (Ph.D. - 4) Neeraj Deshmukh (Ph.D. - 3) Julie Ngan (M.S. - 1) Institute for Signal and Information Processing (ISIP) Director: Dr. Joseph Picone Software Jonathan Hamaker (M.S. - 1) Audrey Le (M.S. - 1) Janna Shaffer (U.G. - 4) Information Technology Richard Duncan (U.G. - 3) Nirmala Kalidindi (M.S. - 2) Suresh Balakrishnam (M.S. - 1) New Hires (U.G. - 3) Joseph Picone Associate Professor Department of Electrical and Computer Engineering Mississippi State University Phone: (601) 325-3149 Box 9571 Fax: (601) 325-2298 Mississippi State, MS 39762 Email: picone@isip.msstate.edu Education Ph.D. in Electrical Engineering, Illinois Institute of Technology, December 1983 M.S. in Electrical Engineering, Illinois Institute of Technology, May 1980 B.S. in Electrical Engineering, Illinois Institute of Technology, May 1979 Areas of Research Speech Understanding, Digital Signal Processing, and Pattern Recognition. Experience Summary Dr. Picone primary interests are in the area of new statistical approaches to speech understanding. He has founded a speech research laboratory at Mississippi State University that conducts research into a number of related areas. (For more information, please check http://www.isip.msstate.edu). Research support has included projects with Texas Instruments, the Linguistic Data Consortium, ARPA's Spoken Language Systems program, and DoD. Dr. Picone recently served as Data and Systems Coordinator for the 1997 Summer Workshop on Large Vocabulary Speech Recognition hosted by the Center for Language and Speech Processing at Johns Hopkins University. During this workshop, he also served as a senior member of a team dedicated to syllable-based speech processing. Under his guidance, the workshop was extremely successful as all four teams participating in the workshop posted statistically significant improvements on the state of the art. Dr. Picone is currently a Senior Member of the IEEE and a Professional Engineer registered in the State of Texas. He is also an Associate Editor for the IEEE Signal Processing Magazine and the IEEE Transactions on Speech and Audio Processing, and has served as a reviewer for numerous organizations including NSF. He was previously employed at Texas Instruments as a Senior Member of Technical Staff and at AT&T Bell Laboratories. He is also a former Adjunct Professor at University of Texas at Dallas and Illinois Institute of Technology. He has previously conducted research in medium and low data rate speech compression. Dr. Picone has published more than 85 papers in the area of speech processing and has been awarded 8 patents. Recent Significant Publications Journal Articles: 1. N. Deshmukh and J. Picone, "AUTOMATIC GENERATION OF N-BEST PRONUNCIATIONS OF PROPER NOUNS," submitted to the IEEE Transactions on Speech and Audio Processing, November 1996. 2. J. Picone, T. Staples, K. Kondo and N. Arai, "Kanji to Hiragana Conversion Based on a Length Constrained N-Gram Analysis," accepted for publication in the IEEE Transactions on Speech and Audio Processing, Fall 1996. 3. J. Picone, W.J. Ebel, and N. Deshmukh, "Automated Speech Understanding: The Next Generation," in Digital Signal Processing Technology, Vol. CR57, pp. 101-114, 1995. Conferences: 4. N. Deshmukh, J. Ngan, J. Hamaker, and J. Picone, "An Advanced System to Generate Multiple Pronunciations of Proper Nouns," Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, vol. 2, pp. 1467-1470, April 1997. 5. J.J. Godfrey, A. Ganapathiraju, and J. Picone, "Microsegment Modeling for Speech Recognition," Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, vol. 3, pp. 1755-1758, April 1997. 6. A. Ganapathiraju and J. Picone, "Echo Cancellation For Evaluating Speaker Identification Technology," Proceedings of IEEE Southeastcon, pp. 100-102, Blacksburg, Virginia, U.S.A., April 1997. 7. N. Deshmukh, R. Duncan, and J. Picone, "Human Listening Benchmarks on ARPA's CSR Performance Tasks," Proceedings Fourth International Conference on Spoken Language Processing, Philadelphia, Pennsylvania, U.S.A., pp. SuP1P1.10, October 1996. 8. N. Deshmukh, M. Weber, and J. Picone, "Automated Generation of N-Best Pronunciations of Proper Nouns," Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, Atlanta, Georgia, vol. 1, pp. 283-286, May 1996. 9. N. Deshmukh and J. Picone, "Human Performance on ARPA's CSR'95 Hub," presented at the ARPA Spoken Language Systems Technology Workshop, Harriman, New York, January 1996. 10. W.J. Ebel and J. Picone, "Human Speech Recognition Performance on the 1994 CSR Spoke 10 Corpus," Proceedings of the Spoken Language Systems Technology Workshop, pp. 53-59, Austin, Texas, January 1995. 11. Y. Muthusamy, E. Holliman, B. Wheatley, J. Picone, and J. Godfrey, "Voice Across Hispanic America: A Telephone Speech Corpus of American Spanish," IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 85-88, Detroit, Michigan, May 1995. . Number of speakers 150 speakers 75 male speakers 75 female speakers Number of items per speaker monosyllables 178 isolated words 35 4-digit sequences 323 items Number of repetitions per item 4 repetitions of each item Range of speaker age 20 yrs. to 60 yrs. Amount of data 120 hours Number of Digital Audio Tapes 76 (120-minute tapes) Total number of utterances 193,800 utterances Number of channels/mic. type 2 (dynamic and condenser mics.) Anticipated size of final corpus (16-bit 16 kHz samples @ 1.0 secs per utterance) 6.5 Gbytes (13 CD-ROMs uncompressed) NEURAl NETWORk SOLUTION JAVA APPLETS http://isip.msstate.edu/software/java_system_response Other ISIP Java Applets include: · Convolution · Frequency Response · Nyquist Criterion · Analog and Digital Filter Design · Compilers and Assembly Code · Hidden Markov Model Toolkit · Speech Recognition Primer ECHO CANCELLATION FOR SPEECH RECOGNITION Semi-Parser Language Model Tagged Text Natural Language Processing Request Generator Knowledge Extractor Filled Templates Netscape Requests Netscape Knowledge Extraction Flat Parsed Structures Speech Recognition Language Model Text Natural Language Understanding "Show me all the reports from the White House on Healthcare." Victor A. Rudis Forestry Imaging USFS Communications Laboratory Elect. and Comp. Eng. Bud Rizer Assistive Technologies T.K. Martin Center