Abstract
Robustness of speech recognition increases by preparing models suitable to acoustic and linguistic variations when they can be predicted. This also increases by incrementally adapting the models when the variation is difficult to predict. In order to combine these methods which need huge amount of computation, and implement them in spoken dialogue systems which need real time processing, this paper investigates using a massively parallel computer. Architecture of selecting a recognition result having the maximum likelihood from the results obtained by multiple speech recognizers driven in parallel and running adaptation processes in the background has been implemented on a GRID computing system. In a restaurant information retrieval task, multiple language models representing linguistic variations of input speech according to utterance contents (topics/utterance categories) and multiple speaker-dependent acoustic models representing speaker variations have been used. Results of evaluation experiments using pre-recorded dialogue utterances show that the proposed system achieves 25.5% reduction in the keyword recognition error rate in comparison with a conventional system using a single acoustic as well as language model, when 75 recognition nodes and 15 speaker-adaptation nodes are driven.
| Translated title of the contribution | Spoken dialogue system robust against speech variations based on massively parallel computing |
|---|---|
| Original language | Japanese |
| Pages (from-to) | 91 - 96 |
| Journal | IPSJ SIG Notes |
| Volume | 2005 |
| Issue number | 127 |
| State | Published - 22 Dec 2005 |