Speaker identification from shouted speech: Analysis and compensation

Kinnunen, Tomi; Saeidi, Rahim; Pohjalainen, Jouni; Alku, Paavo

Yayın:
Speaker identification from shouted speech: Analysis and compensation

dc.contributor.author	Kinnunen, Tomi
dc.contributor.author	Saeidi, Rahim
dc.contributor.author	Pohjalainen, Jouni
dc.contributor.author	Alku, Paavo
dc.contributor.buuauthor	Hanilçi, Cemal
dc.contributor.buuauthor	Ertaş, Figen
dc.contributor.department	Mühendislik Fakültesi
dc.contributor.department	Elektrik Elektronik Mühendisliği Bölümü
dc.contributor.researcherid	AAH-4188-2021
dc.contributor.researcherid	S-4967-2016
dc.contributor.scopusid	35781455400
dc.contributor.scopusid	24724154500
dc.date.accessioned	2023-05-03T10:43:45Z
dc.date.available	2023-05-03T10:43:45Z
dc.date.issued	2013
dc.description	Bu çalışma, 26-31 Mayıs 2013 tarihleri arasında Vancouver[Kanada]’da düzenlenen IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)’da bildiri olarak sunulmuştur.
dc.description.abstract	Text-independent speaker identification is studied using neutral and shouted speech in Finnish to analyze the effect of vocal mode mismatch between training and test utterances. Standard mel-frequency cepstral coefficient (MFCC) features with Gaussian mixture model (GMM) recognizer are used for speaker identification. The results indicate that speaker identification accuracy reduces from perfect (100 %) to 8.71 % under vocal mode mismatch. Because of this dramatic degradation in recognition accuracy, we propose to use a joint density GMM mapping technique for compensating the MFCC features. This mapping is trained on a disjoint emotional speech corpus to create a completely speaker- and speech mode independent emotion-neutralizing mapping. As a result of the compensation, the 8.71 % identification accuracy increases to 32.00 % without degrading the non-mismatched train-test conditions much.
dc.description.sponsorship	Inst Elect & Elect Engineers
dc.description.sponsorship	Inst Elect & Elect Engineers Signal Proc Soc
dc.identifier.citation	Hanilçi, C. vd. (2013). “Speaker identification from shouted speech: Analysis and compensation”. International Conference on Acoustics Speech and Signal Processing ICASSP, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 8027-8031.
dc.identifier.doi	10.1109/ICASSP.2013.6639228
dc.identifier.endpage	8031
dc.identifier.issn	1520-6149
dc.identifier.scopus	2-s2.0-84890452416
dc.identifier.startpage	8027
dc.identifier.uri	https://doi.org/10.1109/ICASSP.2013.6639228
dc.identifier.uri	http://hdl.handle.net/11452/32501
dc.identifier.wos	000329611508038
dc.indexed.wos	CPCIS
dc.language.iso	en
dc.publisher	IEEE
dc.relation.collaboration	Yurt dışı
dc.relation.journal	International Conference on Acoustics Speech and Signal Processing ICASSP, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Acoustics
dc.subject	Engineering
dc.subject	Speaker identification
dc.subject	Shouted speech
dc.subject	Loudspeakers
dc.subject	Mapping
dc.subject	Signal processing
dc.subject	Speech
dc.subject	Emotional speech
dc.subject	Gaussian mixture model
dc.subject	Identification accuracy
dc.subject	Mapping techniques
dc.subject	Mel-frequency cepstral coefficients
dc.subject	Recognition accuracy
dc.subject	Speaker identification
dc.subject	Text-independent speaker identification
dc.subject	Speech recognition
dc.subject.scopus	Whispers; Speech Recognition; Public Speaking
dc.subject.wos	Acoustics
dc.subject.wos	Engineering, electrical & electronic
dc.title	Speaker identification from shouted speech: Analysis and compensation
dc.type	conferenceObject
dc.type.subtype	Proceedings Paper
dspace.entity.type	Publication
local.contributor.department	Mühendislik Fakültesi/Elektrik Elektronik Mühendisliği Bölümü
local.indexed.at	Scopus
local.indexed.at	WOS