Noise robust speech recognition using F-0 contour information

K Iwano, T Seki, S Furui, Koji IWANO

Research output: Contribution to journalArticlepeer-review

Abstract

This paper proposes a noise robust speech recognition method using prosodic information. In Japanese, the fundamental frequency (F-0) contour represents phrase intonation and word accent information. Consequently, it conveys information about prosodic phrases and word boundaries. This paper first describes a noise robust F-0 extraction method using the Hough transform, which achieves high extraction rates under various noise environments. Then it proposes a robust speech recognition method using multi-stream HMMs which model both segmental spectral and F-0 contour information. Speaker-independent experiments are conducted using connected digits uttered by 11 male speakers in various kinds of noise and SNR conditions. The recognition error rate is reduced in all noise conditions, and the best absolute improvement of digit accuracy is about 4.5%. This improvement is achieved by robust digit boundary detection using the prosodic information.
Translated title of the contributionNoise robust speech recognition using F-0 contour information
Original languageAmerican English
Pages (from-to)1102 - 1109
JournalIEICE Transactions on Information and Systems
VolumeE87D
Issue number5
StatePublished - May 2004

Fingerprint

Dive into the research topics of 'Noise robust speech recognition using F-0 contour information'. Together they form a unique fingerprint.

Cite this