Publication:
Using group delay functions from all-pole models for speaker recognition

dc.contributor.authorRajan, Padmanabhan
dc.contributor.authorKinnunen, Tomi H.
dc.contributor.authorPohjalainen, Jouni
dc.contributor.authorAlku, Paavo
dc.contributor.authorBimbot, F.
dc.contributor.authorCerisara, C.
dc.contributor.authorFougeron, C.
dc.contributor.authorGravier, G.
dc.contributor.authorLamel, L.
dc.contributor.authorPellegrino, F.
dc.contributor.authorPerrier, P.
dc.contributor.buuauthorHanilçi, Cemal
dc.contributor.departmentMühendislik Fakültesi
dc.contributor.departmentElektrik Elektronik Mühendisliği Bölümü
dc.contributor.researcheridS-4967-2016
dc.contributor.scopusid35781455400
dc.date.accessioned2022-12-30T11:58:03Z
dc.date.available2022-12-30T11:58:03Z
dc.date.issued2013
dc.descriptionBu çalışma, 25-29 Ağustos 2013 tarihlerinde Lyon[Fransa]'da düzenlenen 14. Annual Conference of the International Speech Communication Association [Interspeech 2013]'da bildiri olarak sunulmuştur.
dc.description.abstractPopular features for speech processing, such as mel-frequency cepstral coefficients (MFCCs), are derived from the short-term magnitude spectrum, whereas the phase spectrum remains unused. While the common argument to use only the magnitude spectrum is that the human ear is phase-deaf, phase-based features have remained less explored due to additional signal processing difficulties they introduce. A useful representation of the phase is the group delay function, but its robust computation remains difficult. This paper advocates the use of group delay functions derived from parametric all-pole models instead of their direct computation from the discrete Fourier transform. Using a subset of the vocal effort data in the NIST 2010 speaker recognition evaluation (SRE) corpus, we show that group delay features derived via parametric all-pole models improve recognition accuracy, especially under high vocal effort. Additionally, the group delay features provide comparable or improved accuracy over conventional magnitude-based MFCC features. Thus, the use of group delay functions derived from all-pole models provide an effective way to utilize information from the phase spectrum of speech signals.
dc.description.sponsorshipAcademy of Finland (253120)
dc.description.sponsorshipInt Speech Commun Association
dc.description.sponsorshipAmazon
dc.description.sponsorshipMicrosoft
dc.description.sponsorshipGoogle
dc.description.sponsorshipTcL SYTRAL
dc.description.sponsorshipEuropean Language Resources Association
dc.description.sponsorshipOuaero
dc.description.sponsorshipImaginove
dc.description.sponsorshipVOCAPIA Research
dc.description.sponsorshipAcapela
dc.description.sponsorshipSpeech Ocean
dc.description.sponsorshipALDEBARAN
dc.description.sponsorshipOrange
dc.description.sponsorshipVecsys
dc.description.sponsorshipIBM Research
dc.description.sponsorshipRaytheon BBN Technology
dc.description.sponsorshipVoxygen
dc.identifier.citationRajan, P. vd. (2013). "Using group delay functions from all-pole models for speaker recognition". 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), 1-5, 2488-2492.
dc.identifier.endpage2492
dc.identifier.issn2308-457X
dc.identifier.scopus2-s2.0-84906257507
dc.identifier.startpage2488
dc.identifier.urihttp://faculty.iitmandi.ac.in/~padman/papers/padman_gdAllPole_interspeech2013.pdf
dc.identifier.urihttp://hdl.handle.net/11452/30193
dc.identifier.volume1-5
dc.identifier.wos000395050001036
dc.indexed.wosCPCIS
dc.language.isoen
dc.publisherIsc-Int Speech Communication Association
dc.relation.journal14th Annual Conference of the International Speech Communication Association (Interspeech 2013)
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectComputer science
dc.subjectEngineering
dc.subjectSpeaker verification
dc.subjectGroup delay functions
dc.subjectHigh vocal effort
dc.subjectAdditive noise
dc.subjectVerification
dc.subjectDiscrete Fourier transforms
dc.subjectGroup delay
dc.subjectPoles
dc.subjectSignal processing
dc.subjectSpeech processing
dc.subjectDirect computations
dc.subjectGroup delay functions
dc.subjectMel-frequency cepstral coefficients
dc.subjectRecognition accuracy
dc.subjectSpeaker recognition
dc.subjectSpeaker recognition evaluations
dc.subjectSpeaker verification
dc.subjectVocal efforts
dc.subjectSpeech recognition
dc.subject.scopusSpeaker Verification; Speech Enhancement; Attack
dc.subject.wosComputer science, artificial intelligence
dc.subject.wosEngineering, electrical & electronic
dc.titleUsing group delay functions from all-pole models for speaker recognition
dc.typeProceedings Paper
dspace.entity.typePublication
local.contributor.departmentMühendislik Fakültesi/Elektrik Elektronik Mühendisliği Bölümü
local.indexed.atScopus
local.indexed.atWOS

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
Hanilci_vd_2013.pdf
Size:
123.35 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Placeholder
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: