横顔の動画像情報を用いたマルチモーダル音声認識

Translated title of the contribution: A Multi - Modal Speech Recognition Using Side - Face Images

吉永 智明, 田村 哲嗣, 岩野 公司, 古井 貞煕, Koji IWANO

Research output: Contribution to journalMisc

Abstract

This paper proposes an multi-modal speech recognition method using lip movement extracted from side-face images for increasing noise-robustness in mobile environments. Although most previous multi-modal speech recognition methods use frontal face (lip) images, these methods are not easy for users since they need to hold a device with a camera in front of their face when talking. Our proposed method capturing lip movement using a small camera installed in a handset is more natural, easy and convenient. Visual features are extracted by optical-flow analysis and combined with audio features. HMMs are built by the multi-stream HMM technique. Experiments conducted using connected digit speech contaminated with white noise show improvement of digit accuracy by using the visual information in various SNR conditions. The best improvement is approximately 6% at 5dB SNR.
Translated title of the contributionA Multi - Modal Speech Recognition Using Side - Face Images
Original languageJapanese
Pages (from-to)61 - 66
JournalIPSJ SIG Notes
Volume2003
Issue number58
StatePublished - 27 May 2003

Fingerprint

Dive into the research topics of 'A Multi - Modal Speech Recognition Using Side - Face Images'. Together they form a unique fingerprint.

Cite this