May / Monthly / Tutorials / Software / Home
A speech recognition system consists of a network of language models, lexicon models and acoustic models. The language models specify the rules that determine what sequences of words are grammatically well-formed and meaningful. The lexicon models define lexical knowledge of the language, i.e., vocabulary definition and word pronunciation. The acoustic models define the knowledge of the pragmatics of language and what people are likely to say in particular contexts.

Example Network
In the ISIP speech recognition system, such a network can comprise many levels. The diagram shown to to the right illustrates a network with three levels: word level, phone level and state level.

The language model (word level) defines permissible word sequences. The lexicon models (phone level) construct a pronunciation dictionary, and the acoustic models (state level) define the Hidden Markov Model state topologies. We use a grammar to represent each of these models.

The grammar states the rules for the structure of a language and can be represented in either a graph or text format. To support the exchange of model data across different speech research systems, a common grammar representation is needed. For this reason, we have selected Java Speech Grammar Format (JSGF). Developed by Sun Microsystems, JSGF is a platform-independent, vendor-independent textual grammar representation. JSGF is simple and easy to understand and thus has become a standard grammar format.

A complete JSGF grammar includes a grammar header and a grammar body. The grammar header includes:

  • self-identifying header
  • grammar declaration
  • import grammar declarations (optional)
The grammar body contains the grammar rules which define the phrases and sentences that can be spoken in the language. Let's go through an example and see how a JSGF grammar is written.

  • Self-identifying header identifies that the grammar is in JSGF format and indicates the version of JSGF being used. This header is started by # sign and terminated by semicolon:

      #JSGF V1.0;

  • Grammar declaration gives a unique name for the grammar and the package names to contain this grammar. This name is required to be preceded by the keyword "grammar":

      grammar network.grammar.activity;

  • Rule definitions include one or more rules terminated by semicolons. If preceded by the keyword "public", the rule is public and can be used by a recognizer to determine what may be spoken. Without the public declaration, a rule is implicitly private and can only be referenced within rule definitions in the local grammar. The basic format for a rule is shown below:

    JSGF Diagram
      public <rulename> = rule expansion;

    <rulename> simply specifies a name for a particular rule. A rule expansion can include terminal symbols and rulenames that are references to other rules. The vertical bar | between terminal symbols indicates that the symbol on either side of the bar may occur. Let's view an example with two rules:

      public <activity> = draw <color> circle;
      <color> = red | blue | green;


    The first rule with rulename, <activity>, is defined by the rule expansion, 'draw <color> circle;' (Notice this rule is public.)
    The reference to <color> is defined by the second rule with rulename <color> and can be either red, blue, or green, as indicated by the vertical bar | symbols. The expansion of this grammar is illustrated in the diagram shown to the right.

  • C++ and Java-styled comments are allowed both in the grammar header and grammar body:

      // I am writing JSGF comments
      /* another way to write comments */


For a detailed reference on how to write JSGF grammars, see Java Speech Grammar Format Specification. Users of the ISIP system can define JSGF language models and acoustic models in plain text files. The isip_model_creator utility translates the JSGF grammars to a set of model files in ISIP internal sof format ready for training and decoding. ISIP has also been developing a GUI tool utility written in Java. The tool allows users to draw models interactively in a graphical format and then convert them to JSGF grammar automatically. For a more extended tutorial that describes how to use JSGF in the ISIP system, see production system tutorials.