Software / Home

Speech-based interface such as the in-vehicle dialog system is designed to provide the user a unique opportunity of giving spoken instructions. With this system, drivers can take better control of the navigation conditions while being fully occupied with the road that requires use of hands, eyes and cognition.

The dialog system being developed within CAVS currently consists of a prototype and has yet to overcome implementation problems such as enhancing system efficiency under noisy operating environments and increasing system responsivity in real time. Once the mentioned difficulties are overcome, the infrastructure of dialog system is also applicable to assisting workforce training in automotive manufacturing, which has many commonalities with the in-vehicle navigation. In fact, the workforce training and other similar domains are part of larger initiative at CAVS. Because of its simplicity and well defined nature, the in-vehicle dialog system has been chosen to constitute a starting application to build more advanced dialog systems.

The prototype dialog system includes taking unconstrained speech queries to obtain information needed to navigate the Mississippi State University campus and the surrounding city of Starkville. The system specifically handles navigation queries of address, direction, list of places, distance and information about the MSU campus and resolves those queries by employing speech recognition, natural language understanding and user-centered dialog control.

The speech recognition is done using a publicly available ISIP recognition toolkit that implements a standard HMM-based (Hidden Markov Model) speaker independent continuous speech recognition system. The recognizer processes acoustic samples received from the audio server and produces the most likely word sequence that would correspond to the user's utterance.

The natural language understanding (i.e. natural language unit, NLU) constitutes an important part of the dialog system which is done using a publicly available semantic case frame parser. The parser employs a semantic grammar consisting of case frames with named slots. Word sequences within each slot are specified using a context free grammar (CFG). The recognizer output is used to fill the slots which creates a semantic parse tree with the slot name as root. The following simple example illustrates how a frame for requesting driving information looks like.


FRAME: Drive
   [route]
   [distance]

The slot with the name [route] couples with the direction queries for a specific route and the other slot with the name [distance], as the name implies, couples with the distance queries between two places. Some context free grammar rules may be seen in the insertion to the upper right. This type of grammar is quite effective for dialog systems that, in spontaneous dialog, may encounter ungrammatical inputs such as "I would like... I..need to go to the Post Office on campus." In the given example of utterance, the first two hesitations of the user are ignored and the actual drive direction is captured from the whole sentence that reads as "need to go to the Post Office on campus" in which the verb "go" couples with the slot named [go_verb] and the segment "Post Office on campus", with the slot named [arrive_loc] as you may also see in the insertion to the upper right. This means that the users do not have to speak to the system in an exact syntactic correctness, which owes to the flexibility of context free grammar.

The prototype dialog system utilizes another publicly available software structure, which is called Galaxy Communicator, to enable the exchange of information components and evaluation of dialog results across different sites. The hub structure corresponding to the Galaxy Communicator is shown in the insertion to the left. Those base components (i.e. servers) attached to the hub are derived from publicly available software domains and communicate with the hub using a network. The servers can also pass information to the other servers using the hub functions as a central router. There is a special communication protocol used by the communicator that enforces different servers to follow and also encourages a plug-and-play approach. In the figure above, the server named as dialog manager manages the nature of dialog between the user and system. The user initializes the dialog making a request which is received by the speech recognition module and parsed by the NLU module. If there is no missing or conflicting information, the database server is employed to deliver the required information and the results are output to the user. However, in case of conflict in the dialog, the dialog manager communicates back to the user to resolve the conflict. The following utterances show a few examples of how a user query may look like.


  Drive_Direction : "How can I get from Lee Boulevard to Kroger?"
  Drive_Address   : "Where is the bakery located?"
  Drive_Distance  : "How far is China Garden from here?"
  Drive_Turn      : "I'm on the corner of Nash and Route 82.
                     What's the next turn to get to campus?"
  Drive_Quality   : "Find me the most scenic route from LJ's to Scott Field."
  Drive_Intersect : "Does Lynn Lane intersect Academy Road?"
  Drive_Special   : "Can I bypass Highway 12 to get to Bryan Field?"

Several pilot experiments were conducted to assess and improve the performance of the dialog system. The figure to the right illustrates the dramatic improvement achieved with the natural language unit (blue) and the dialog manager (maroon). The pilot experiments helped bring the over all system error rate down to less than 5%.

The prototype dialog system also incorporates a GIS database to handle routing queries and is planned to include the capability of real-time GPS data capture. The current system is also planned to make it available to the local community via telephone access.

For more information about what we do here in ISIP and CAVS, please visit the following links.

A recently published article.