Abstract
In a speech recognition system a Voice Activity Detection (VAD) is a crucial component for maintaining accuracy. This paper proposes an approach that uses speech/non-speech confidence measures to adjust the score of the recognition hypotheses. In order to achieve good search performance, it is important to properly adapt the GMMs for input utterances and environmental noise. This paper also proposes an unsupervised on-line GMM adaptation method based on MAP estimation. Robustness of the proposed method is further improved by weighting updating parameters of GMMs according to the confidence measure for the adaptation data and adaptation speed is largely accelerated by caching statistical values to adapt GMMs. Experimental results on Drivers' Japanese Speech Corpus in a Car Environment (DJSC) show that our approach can improve the accuracy significantly as compared with typical front-end based VAD methods. The adaptation method significantly improves the word accuracy. Moreover, the weighting method improves the robustness of the unsupervised adaptation and the cache method largely accelerates the decoding process, Consequently, the proposed adaptive decoding method significantly improves word accuracy under noise with only minor increase in computational cost.
| Translated title of the contribution | Noise-robust speech recognition decoder using speech/non-speech confidence measures |
|---|---|
| Original language | Japanese |
| Pages (from-to) | 49 - 54 |
| Journal | IEICE technical report |
| Volume | 110 |
| Issue number | 81 |
| State | Published - 10 Jun 2010 |