November / Monthly / Tutorials / Software / Home

Speaker verification is the use of machine to verify a person's claimed identity from his voice. As a moderate-cost, unforgettable and unobtrusive biometric, human speech has been intensely demanded to use to identify a customer in many applications such as information or services access control and telephone banking. Therefore, speaker verification technology has brought about many research and engineering efforts both in the academia and industry.

On behalf of the Institute for Signal and Information Processing at Mississippi State University, I am pleased to announce the first release of the ISIP speaker verification system, which was developed using our speech recognition system known as the production system. The ISIP speaker verification system takes MFCC acoustic features as input and outputs acceptance/rejection hypotheses. The following figure represents the architecture of this process.



In the ISIP speaker verification system, MFCC features of an utterance are processed and the likelihood scores are calculated based on a set of trained models on a per-frame basis. The likelihood scores are then combined via an hidden Markov Model (HMM) to yield an overall utterance score, which is a criterion for the system to make a decision on whether to accept or reject the claimed identification.

The ISIP speaker verification system uses Gaussian mixture models (GMM's) as underlying classifiers. The probability, P(X|SPEAKER_MODEL), that the observed utterance, X={x1,x2,...,xN}, is generated by the speaker model, SPEAKER_MODEL, is considered as the overall utterance score. It is estimated by the mean log likelihood over the utterance:



where P(Xi|SPEAKER_MODEL) is the likelihood score of the ith frame. To make a decision, the overall utterance score is compared to a threshold known as the absolute threshold for speaker verification.

In order to obtain greater performance, a better criterion is employed. An imposter model, IMPOSTER_MODEL, is incorporated to take into account the probability distribution of the imposters. The overall utterance score log ratio is defined as the difference between the log likelihoods of the speaker model and the imposter model:



To make a decision, the overall utterance score log ratio is compared to a threshold known as the relative threshold for speaker verification. The above formulas form the basis of the ISIP speaker verification system.

The ISIP speaker verification system is now part of our production system. To download and install our production system, go to
http://www.cavs.msstate.edu/projects/speech/software

http://www.cavs.msstate.edu/projects/speech/software/ tutorials/production/fundamentals/current/section_01/

and follow the instructions step by step. To figure out how to run the ISIP speaker verification system, after successfully installing our production system, use the following command:

isip_verify -help

This will display the detailed synopsis of the isip_verify utility, core component of the speaker verification system. The following examples show several usages of this utility:

  • This example shows how to verify utterances specified in the input list "id_list.sof", while output options are specified in the parameter file "param.sof":
    isip_verify -p param.sof -list id_list.sof
  • This example shows how to verify utterances specified in the input list, while output files are specified in the output list:
    isip_verify -p param.sof -list id_list.sof -output_mode LIST \
    -output_list out_list.sof
  • This example shows how to verify utterances specified in the input list, while output files are determined by the input file names transformation:
    isip_verify -p param.sof -list id_list.sof -output_mode TRANSFORM \
    -directory ./output -preserve 2