Publication:
Speaker identification from shouted speech: Analysis and compensation

Thumbnail Image

Date

Organizational Units

Authors

Hanilçi, Cemal
Ertaş, Figen

Authors

Kinnunen, Tomi
Saeidi, Rahim
Pohjalainen, Jouni
Alku, Paavo

Advisor

Language

Publisher:

IEEE

Journal Title

Journal ISSN

Volume Title

Abstract

Text-independent speaker identification is studied using neutral and shouted speech in Finnish to analyze the effect of vocal mode mismatch between training and test utterances. Standard mel-frequency cepstral coefficient (MFCC) features with Gaussian mixture model (GMM) recognizer are used for speaker identification. The results indicate that speaker identification accuracy reduces from perfect (100 %) to 8.71 % under vocal mode mismatch. Because of this dramatic degradation in recognition accuracy, we propose to use a joint density GMM mapping technique for compensating the MFCC features. This mapping is trained on a disjoint emotional speech corpus to create a completely speaker- and speech mode independent emotion-neutralizing mapping. As a result of the compensation, the 8.71 % identification accuracy increases to 32.00 % without degrading the non-mismatched train-test conditions much.

Description

Bu çalışma, 26-31 Mayıs 2013 tarihleri arasında Vancouver[Kanada]’da düzenlenen IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)’da bildiri olarak sunulmuştur.

Source:

Keywords:

Keywords

Acoustics, Engineering, Speaker identification, Shouted speech, Loudspeakers, Mapping, Signal processing, Speech, Emotional speech, Gaussian mixture model, Identification accuracy, Mapping techniques, Mel-frequency cepstral coefficients, Recognition accuracy, Speaker identification, Text-independent speaker identification, Speech recognition

Citation

Hanilçi, C. vd. (2013). “Speaker identification from shouted speech: Analysis and compensation”. International Conference on Acoustics Speech and Signal Processing ICASSP, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 8027-8031.

Endorsement

Review

Supplemented By

Referenced By

0

Views

17

Downloads