A speech recognition system consists of a network of
language models, lexicon models and acoustic models.
The language models specify the rules that determine what sequences of words
are grammatically well-formed and meaningful. The lexicon models define
lexical knowledge of the language, i.e., vocabulary definition and word
pronunciation. The acoustic models define the knowledge of the pragmatics
of language and what people are likely to say in particular contexts.
In the ISIP speech recognition system, such a network can comprise
many levels. The diagram shown to to the right illustrates a network
with three levels: word level, phone level and state level.
The language model (word level) defines permissible word sequences.
The lexicon models (phone level) construct a pronunciation
dictionary, and the acoustic models (state level) define the Hidden Markov
Model state topologies. We use a grammar to represent each of these models.
The grammar states the rules for the structure of a
language and can be represented in either a graph or text format.
To support the exchange of model data across different speech
research systems, a common grammar representation
is needed. For this reason, we have selected
Java Speech Grammar Format (JSGF).
Developed by Sun Microsystems, JSGF is a platform-independent,
vendor-independent textual grammar representation. JSGF is simple and easy to
understand and thus has become a standard grammar format.
A complete JSGF grammar includes a grammar header and a grammar body.
The grammar header includes:
- self-identifying header
- grammar declaration
- import grammar declarations (optional)
The grammar body contains the grammar rules which define the phrases and
sentences that can be spoken in the language.
Let's go through an
example and see how a JSGF grammar is written.
- Self-identifying header identifies that the grammar is in JSGF
format and indicates the version of JSGF being used.
This header is started by # sign and terminated by semicolon:
- Grammar declaration gives a unique name for the grammar and the
package names to contain this grammar. This name is required to be
preceded by the keyword "grammar":
grammar network.grammar.activity;
- Rule definitions include one or more rules terminated by semicolons.
If preceded by the keyword "public", the rule is public and can be
used by a recognizer to determine what may be spoken.
Without the public declaration, a rule is implicitly private and
can only be referenced within rule definitions in the local grammar.
The basic format for a rule is shown below:
public <rulename> = rule expansion;
<rulename> simply specifies a name for a particular rule.
A rule expansion can include terminal symbols and rulenames that are
references to other rules. The vertical bar | between terminal symbols
indicates that the symbol on either side of the bar may occur.
Let's view an example with two rules:
public <activity> = draw <color> circle;
<color> = red | blue | green;
The first rule with rulename,
<activity>, is defined by the rule
expansion, 'draw <color> circle;' (Notice this rule is public.)
The reference to <color> is defined by the second rule with
rulename <color> and can be either red, blue, or green, as
indicated by the vertical bar | symbols. The expansion of this grammar
is illustrated in the diagram shown to the right.
- C++ and Java-styled comments are allowed both in the grammar header
and grammar body:
// I am writing JSGF comments
/* another way to write comments */
For a detailed reference on how to write JSGF grammars, see
Java Speech Grammar Format Specification.
Users of the ISIP system can define JSGF language models and acoustic
models in plain text files. The
isip_model_creator
utility translates the JSGF grammars to a set of
model files in ISIP internal sof format ready for training and decoding.
ISIP has also been developing a
GUI tool
utility written in Java. The tool allows users to draw models interactively
in a graphical format and then convert them to JSGF grammar automatically.
For a more extended tutorial that describes how to use JSGF in the ISIP system,
see production system tutorials.
|