EUSIPCO'2002 banner

Paper data
Two-level phoneme recognition based on successive use of monophone and diphone models

Somervuo Panu, Neural Networks Research Centre, Helsinki University of Technology, P.O.Box 5400, FIN-02015 HUT, Fin

Page numbers in the proceedings:
Volume III pp 77-80

Multimedia Data Protection / Speech Analysis and Recognition

Paper abstract
Two-level phoneme recognition method is proposed based on successive use of monophone and diphone models. In the first level of the recognition, computationally lighter (in terms of the number of the models) monophone models are used for selecting a subset of diphone models. For each input utterance, those diphone models are set active whose left or right contexts are present in the recognized monophone sequence. The chosen diphone models are then evaluated in the next level of the recognition. This substantially decreases the computational load compared to the case where all diphone models must be examined for each input utterance. In the Finnish speaker-independent phoneme recognition task on average half of the diphone models could be eliminated in the second level of the recognition per word utterance while still achieving the same recognition accuracy as when using all the models. Clustered monophone and diphone models were also experimented as the models in the first-level recognizer. This did not, however, bring any further improvement to the results obtained by using unclustered monophone and diphone models.

A PDF version is available here