Abstract
This paper proposes a noise-robust speaker verification method using prosodic information. This method uses logF_0 and △logF_0 as prosodic features. They are combined with segmental features such as cepstral parameters. F_0 is extracted by a noise-robust method using the Hough transform which is applied to time-cepstrum images. The segmental and prosodic features are combined and modeled by multi-stream HMMs. Speaker verification experiments were conducted using four-connected-digit utterances of Japanese, contaminated by white noise with various SNRs. Experimental results show that equal error rates were reduced in all SNR conditions. The best reduction was observed at lOdB SNR condition; the error rate was reduced by 39.9% from the baseline method using only segmental features.
| Translated title of the contribution | Use of F_0 information for noise-robust speaker verification |
|---|---|
| Original language | Japanese |
| Pages (from-to) | 1 - 6 |
| Journal | IEICE technical report. Speech |
| Volume | 104 |
| Issue number | 87 |
| State | Published - 21 May 2004 |