Comparative analysis of speaker identification performance using deep learning, machine learning, and novel subspace classifiers with multiple feature extraction techniques

Keser, Serkan

Yayın:
Comparative analysis of speaker identification performance using deep learning, machine learning, and novel subspace classifiers with multiple feature extraction techniques

Tarih

2024-10-08

Kurum Yazarları

Gezer, Esra

Yazarlar

Keser, Serkan

Türü

Article

Yayıncı:

Academic Press Inc Elsevier Science

Özet

Speaker identification is vital in various application domains, such as automation, security, and enhancing user experience. In the literature, convolutional neural network (CNN) or recurrent neural network (RNN) classifiers are generally used due to the one-dimensional time series of speech signals. However, new approaches using subspace classifiers are also crucial in speaker identification. In this study, in addition to the newly developed subspace classifiers for speaker identification, traditional classification algorithms, and various hybrid algorithms are analyzed in terms of performance. Stacked Features-Common Vector Approach (SF-CVA) and Hybrid CVAFisher Linear Discriminant Analysis (HCF) subspace classifiers are used for speaker identification for the first time in the literature. In addition, CVA is evaluated for the first time for speaker identification using hybrid deep learning algorithms. The study includes Recurrent Neural Network-Long Short-Term Memory (RNN-LSTM), ivector + Probabilistic Linear Discriminant Analysis (i-vector+PLDA), +PLDA), Time Delayed Neural Network (TDNN), AutoEncoder+Softmax +Softmax (AE+Softmax), +Softmax), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Common Vector Approach (CVA), SF-CVA, HCF, and Alexnet classifiers for speaker identification. This study uses MNIST, TIMIT and Voxceleb1 databases for clean and noisy speech signals. Six different feature structures are tested in the study. The six different feature extraction approaches consist of Mel Frequency Cepstral Coefficients (MFCC)+Pitch, +Pitch, Gammatone Filter Bank Cepstral Coefficients (GTCC)+Pitch, +Pitch, MFCC+GTCC+Pitch+seven +GTCC +Pitch +seven spectral features, spectrograms,i-vectors, and Alexnet feature vectors. High accuracy rates were obtained, especially in tests using SF-CVA. RNN-LSTM, i-vector+KNN, +KNN, AE+Softmax, +Softmax, TDNN, and i-vector+HCF +HCF classifiers also gave high test accuracy rates.

Konusu

Neural-networks, Vector approach, System, Algorithm, Selection, Robust, Gmm, Speaker identification, Six different features, Hybrid classifiers, Sf-cva, Noisy speech signals, Science & technology, Technology, Engineering, electrical & electronic, Engineering

URI

https://doi.org/10.1016/j.dsp.2024.104811
https://hdl.handle.net/11452/49465

Koleksiyonlar

İndeksli Yayınlar / Indexed Publications

Detay Görünüm

5

Views

0

Downloads

View PlumX Details

Yayın:
Comparative analysis of speaker identification performance using deep learning, machine learning, and novel subspace classifiers with multiple feature extraction techniques

Tarih

Akademik Birimler

Kurum Yazarları

Yazarlar

Danışman

Dil

Türü

Yayıncı:

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Özet

Açıklama

Kaynak:

Anahtar Kelimeler:

Konusu

Alıntı

URI

Koleksiyonlar

Endorsement

Review

Supplemented By

Referenced By

5

Views

0

Downloads

Yayın: Comparative analysis of speaker identification performance using deep learning, machine learning, and novel subspace classifiers with multiple feature extraction techniques

Tarih

Akademik Birimler

Kurum Yazarları

Yazarlar

Danışman

Dil

Türü

Yayıncı:

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Özet

Açıklama

Kaynak:

Anahtar Kelimeler:

Konusu

Alıntı

URI

Koleksiyonlar

Endorsement

Review

Supplemented By

Referenced By

5

Views

0

Downloads

Yayın:
Comparative analysis of speaker identification performance using deep learning, machine learning, and novel subspace classifiers with multiple feature extraction techniques