対話システムへの利用を想定したマルチモーダル音声認識の検討

Translated title of the contribution: A Study on Multimodal Speech Recognition for Spoken Dialogue Systems

高山 俊輔, 松尾 俊秀, 岩野 公司, 古井 貞煕, Koji IWANO

Research output: Contribution to journalMisc

Abstract

This paper describes speaker-independent multimodal speech recognition toward constructing multimodal spoken dialogue systems. In order to build a multimodal speech recognition system, an audio-visual speech database was first collected from 25 male speakers. In our system, a multi-stream HMM technique is used for integrating audio and visual information. We propose a multi-stream HMM construction method where audio-only and visual-only models are separately trained and then integrated at the state level. In this framework, the state tying structure of the target audio-visual model is inherited from the audio-only triphone HMM. Experimental results show that the proposed method is effective in various noise conditions. We also compared two visual features, optical-flow-based features and PCA(Principal Component Analysis)-based features, in our recognition framework. The results show that the optical-flow-based features yield better performance than the PCA-based features.
Translated title of the contributionA Study on Multimodal Speech Recognition for Spoken Dialogue Systems
Original languageJapanese
Pages (from-to)19 - 24
JournalIEICE technical report
Volume107
Issue number77
StatePublished - 24 May 2007

Fingerprint

Dive into the research topics of 'A Study on Multimodal Speech Recognition for Spoken Dialogue Systems'. Together they form a unique fingerprint.

Cite this