Using group delay functions from all-pole models for speaker recognition

Rajan, Padmanabhan; Kinnunen, Tomi H.; Pohjalainen, Jouni; Alku, Paavo; Bimbot, F.; Cerisara, C.; Fougeron, C.; Gravier, G.; Lamel, L.; Pellegrino, F.; Perrier, P.

Yayın:
Using group delay functions from all-pole models for speaker recognition

dc.contributor.author	Rajan, Padmanabhan
dc.contributor.author	Kinnunen, Tomi H.
dc.contributor.author	Pohjalainen, Jouni
dc.contributor.author	Alku, Paavo
dc.contributor.author	Bimbot, F.
dc.contributor.author	Cerisara, C.
dc.contributor.author	Fougeron, C.
dc.contributor.author	Gravier, G.
dc.contributor.author	Lamel, L.
dc.contributor.author	Pellegrino, F.
dc.contributor.author	Perrier, P.
dc.contributor.buuauthor	Hanilçi, Cemal
dc.contributor.department	Mühendislik Fakültesi
dc.contributor.department	Elektrik Elektronik Mühendisliği Bölümü
dc.contributor.researcherid	S-4967-2016
dc.contributor.scopusid	35781455400
dc.date.accessioned	2022-12-30T11:58:03Z
dc.date.available	2022-12-30T11:58:03Z
dc.date.issued	2013
dc.description	Bu çalışma, 25-29 Ağustos 2013 tarihlerinde Lyon[Fransa]'da düzenlenen 14. Annual Conference of the International Speech Communication Association [Interspeech 2013]'da bildiri olarak sunulmuştur.
dc.description.abstract	Popular features for speech processing, such as mel-frequency cepstral coefficients (MFCCs), are derived from the short-term magnitude spectrum, whereas the phase spectrum remains unused. While the common argument to use only the magnitude spectrum is that the human ear is phase-deaf, phase-based features have remained less explored due to additional signal processing difficulties they introduce. A useful representation of the phase is the group delay function, but its robust computation remains difficult. This paper advocates the use of group delay functions derived from parametric all-pole models instead of their direct computation from the discrete Fourier transform. Using a subset of the vocal effort data in the NIST 2010 speaker recognition evaluation (SRE) corpus, we show that group delay features derived via parametric all-pole models improve recognition accuracy, especially under high vocal effort. Additionally, the group delay features provide comparable or improved accuracy over conventional magnitude-based MFCC features. Thus, the use of group delay functions derived from all-pole models provide an effective way to utilize information from the phase spectrum of speech signals.
dc.description.sponsorship	Academy of Finland (253120)
dc.description.sponsorship	Int Speech Commun Association
dc.description.sponsorship	Amazon
dc.description.sponsorship	Microsoft
dc.description.sponsorship	Google
dc.description.sponsorship	TcL SYTRAL
dc.description.sponsorship	European Language Resources Association
dc.description.sponsorship	Ouaero
dc.description.sponsorship	Imaginove
dc.description.sponsorship	VOCAPIA Research
dc.description.sponsorship	Acapela
dc.description.sponsorship	Speech Ocean
dc.description.sponsorship	ALDEBARAN
dc.description.sponsorship	Orange
dc.description.sponsorship	Vecsys
dc.description.sponsorship	IBM Research
dc.description.sponsorship	Raytheon BBN Technology
dc.description.sponsorship	Voxygen
dc.identifier.citation	Rajan, P. vd. (2013). "Using group delay functions from all-pole models for speaker recognition". 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), 1-5, 2488-2492.
dc.identifier.endpage	2492
dc.identifier.issn	2308-457X
dc.identifier.scopus	2-s2.0-84906257507
dc.identifier.startpage	2488
dc.identifier.uri	http://faculty.iitmandi.ac.in/~padman/papers/padman_gdAllPole_interspeech2013.pdf
dc.identifier.uri	http://hdl.handle.net/11452/30193
dc.identifier.volume	1-5
dc.identifier.wos	000395050001036
dc.indexed.wos	CPCIS
dc.language.iso	en
dc.publisher	Isc-Int Speech Communication Association
dc.relation.journal	14th Annual Conference of the International Speech Communication Association (Interspeech 2013)
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Computer science
dc.subject	Engineering
dc.subject	Speaker verification
dc.subject	Group delay functions
dc.subject	High vocal effort
dc.subject	Additive noise
dc.subject	Verification
dc.subject	Discrete Fourier transforms
dc.subject	Group delay
dc.subject	Poles
dc.subject	Signal processing
dc.subject	Speech processing
dc.subject	Direct computations
dc.subject	Group delay functions
dc.subject	Mel-frequency cepstral coefficients
dc.subject	Recognition accuracy
dc.subject	Speaker recognition
dc.subject	Speaker recognition evaluations
dc.subject	Speaker verification
dc.subject	Vocal efforts
dc.subject	Speech recognition
dc.subject.scopus	Speaker Verification; Speech Enhancement; Attack
dc.subject.wos	Computer science, artificial intelligence
dc.subject.wos	Engineering, electrical & electronic
dc.title	Using group delay functions from all-pole models for speaker recognition
dc.type	conferenceObject
dc.type.subtype	Proceedings Paper
dspace.entity.type	Publication
local.contributor.department	Mühendislik Fakültesi/Elektrik Elektronik Mühendisliği Bölümü
local.indexed.at	Scopus
local.indexed.at	WOS