Yayın:
Comparative analysis of speaker identification performance using deep learning, machine learning, and novel subspace classifiers with multiple feature extraction techniques

dc.contributor.authorKeser, Serkan
dc.contributor.buuauthorGezer, Esra
dc.contributor.departmentMühendislik Fakültesi
dc.contributor.departmentElektrik ve Elektronik Mühendisliği Ana Bilim Dalı
dc.contributor.researcheridCRO-9465-2022
dc.date.accessioned2025-01-16T05:35:09Z
dc.date.available2025-01-16T05:35:09Z
dc.date.issued2024-10-08
dc.description.abstractSpeaker identification is vital in various application domains, such as automation, security, and enhancing user experience. In the literature, convolutional neural network (CNN) or recurrent neural network (RNN) classifiers are generally used due to the one-dimensional time series of speech signals. However, new approaches using subspace classifiers are also crucial in speaker identification. In this study, in addition to the newly developed subspace classifiers for speaker identification, traditional classification algorithms, and various hybrid algorithms are analyzed in terms of performance. Stacked Features-Common Vector Approach (SF-CVA) and Hybrid CVAFisher Linear Discriminant Analysis (HCF) subspace classifiers are used for speaker identification for the first time in the literature. In addition, CVA is evaluated for the first time for speaker identification using hybrid deep learning algorithms. The study includes Recurrent Neural Network-Long Short-Term Memory (RNN-LSTM), ivector + Probabilistic Linear Discriminant Analysis (i-vector+PLDA), +PLDA), Time Delayed Neural Network (TDNN), AutoEncoder+Softmax +Softmax (AE+Softmax), +Softmax), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Common Vector Approach (CVA), SF-CVA, HCF, and Alexnet classifiers for speaker identification. This study uses MNIST, TIMIT and Voxceleb1 databases for clean and noisy speech signals. Six different feature structures are tested in the study. The six different feature extraction approaches consist of Mel Frequency Cepstral Coefficients (MFCC)+Pitch, +Pitch, Gammatone Filter Bank Cepstral Coefficients (GTCC)+Pitch, +Pitch, MFCC+GTCC+Pitch+seven +GTCC +Pitch +seven spectral features, spectrograms,i-vectors, and Alexnet feature vectors. High accuracy rates were obtained, especially in tests using SF-CVA. RNN-LSTM, i-vector+KNN, +KNN, AE+Softmax, +Softmax, TDNN, and i-vector+HCF +HCF classifiers also gave high test accuracy rates.
dc.identifier.doi10.1016/j.dsp.2024.104811
dc.identifier.issn1051-2004
dc.identifier.scopus2-s2.0-85205900269
dc.identifier.urihttps://doi.org/10.1016/j.dsp.2024.104811
dc.identifier.urihttps://hdl.handle.net/11452/49465
dc.identifier.volume156
dc.identifier.wos 001332297900001
dc.indexed.wosWOS.SCI
dc.language.isoen
dc.publisherAcademic Press Inc Elsevier Science
dc.relation.journalDigital Signal Processing
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectNeural-networks
dc.subjectVector approach
dc.subjectSystem
dc.subjectAlgorithm
dc.subjectSelection
dc.subjectRobust
dc.subjectGmm
dc.subjectSpeaker identification
dc.subjectSix different features
dc.subjectHybrid classifiers
dc.subjectSf-cva
dc.subjectNoisy speech signals
dc.subjectScience & technology
dc.subjectTechnology
dc.subjectEngineering, electrical & electronic
dc.subjectEngineering
dc.titleComparative analysis of speaker identification performance using deep learning, machine learning, and novel subspace classifiers with multiple feature extraction techniques
dc.typeArticle
dspace.entity.typePublication
local.contributor.departmentMühendislik Fakültesi/Elektrik ve Elektronik Mühendisliği Ana Bilim Dalı
local.indexed.atWOS
local.indexed.atScopus

Dosyalar