ac

Evaluating the Efficacy of Traditional Machine Learning Models in Speaker Recognition: A Comparative Study Using the LibriSpeech Dataset

Authors

  • Gregorius Airlangga Atma Jaya Catholic University of Indonesia

DOI:

10.47709/brilliance.v3i2.3488

Keywords:

Speech Recognition, Machine Learning, Naive Bayes, Logistic Regression, Gradient Boosting

Dimension Badge Record



Abstract

The efficacy of machine learning models in speaker recognition tasks is critical for advancements in security systems, biometric authentication, and personalized user interfaces. This study provides a comparative analysis of three prominent machine learning models: Naive Bayes, Logistic Regression, and Gradient Boosting, using the LibriSpeech test-clean dataset—a corpus of read English speech from audiobooks designed for training and evaluating speech recognition systems. Mel-Frequency Cepstral Coefficients (MFCCs) were extracted as features from the audio samples to represent the power spectrum of the speakers’ voices. The models were evaluated based on precision, recall, F1-score, and accuracy to determine their performance in correctly identifying speakers. Results indicate that Logistic Regression outperformed the other models, achieving nearly perfect scores across all metrics, suggesting its superior capability for linear classification in high-dimensional spaces. Naive Bayes also demonstrated high efficiency and robustness, despite the inherent assumption of feature independence, while Gradient Boosting showed slightly lower performance, potentially due to model complexity and overfitting. The study underscores the potential of simpler machine learning models to achieve high accuracy in speaker recognition tasks, particularly where computational resources are limited. However, limitations such as the controlled nature of the dataset and the focus on a single feature type were noted, with recommendations for future research to include more diverse environmental conditions and feature sets.

Google Scholar Cite Analysis
Abstract viewed = 57 times

References

Abimbola, J., Kostrzewa, D., & Kasprowski, P. (2023). Optimization of MFCCs for Time Signature Detection Using Genetic Algorithm. Proceedings of the Companion Conference on Genetic and Evolutionary Computation, 459–462.

Akay, B., Karaboga, D., & Akay, R. (2022). A comprehensive survey on optimizing deep learning models by metaheuristics. Artificial Intelligence Review, 1–66.

Akhtarshenas, A., Vahedifar, M. A., Ayoobi, N., Maham, B., & Alizadeh, T. (2023). Federated Learning: A Cutting-Edge Survey of the Latest Advancements and Applications. ArXiv Preprint ArXiv:2310.05269.

Alhaizaey, Y. (2023). Optimizing task allocation for edge compute micro-clusters. University of Glasgow.

Alimi, O. A., Ouahada, K., & Abu-Mahfouz, A. M. (2020). A review of machine learning approaches to power system security and stability. IEEE Access, 8, 113512–113531.

Archana, R., & Jeevaraj, P. S. E. (2024). Deep learning models for digital image processing: a review. Artificial Intelligence Review, 57(1), 11.

Awad, A. L., Elkaffas, S. M., & Fakhr, M. W. (2023). Stock Market Prediction Using Deep Reinforcement Learning. Applied System Innovation, 6(6), 106.

Bai, Z., & Zhang, X.-L. (2021). Speaker recognition based on deep learning: An overview. Neural Networks, 140, 65–99.

Barai, B., Chakraborty, T., Das, N., Basu, S., & Nasipuri, M. (2022). Closed-set speaker identification using VQ and GMM based models. International Journal of Speech Technology, 25(1), 173–196.

Cavalcanti, J. C., Eriksson, A., & Barbosa, P. A. (2021). Multiparametric analysis of speaking fundamental frequency in genetically related speakers using different speech materials: Some forensic implications. Journal of Voice.

Chignoli, G. (2022). Speech components in phonetic characterisation of speakers: a study on complementarity and redundancy of conveyed information. Sorbonne Nouvelle.

Clarke, C., & others. (2022). Reviver Voce: The Voice, Technology, and Death. Falmouth University.

Dhakal, P., Damacharla, P., Javaid, A. Y., & Devabhaktuni, V. (2019). A near real-time automatic speaker recognition architecture for voice-based user interface. Machine Learning and Knowledge Extraction, 1(1), 504–520.

Ezzameli, K., & Mahersia, H. (2023). Emotion recognition from unimodal to multimodal analysis: A review. Information Fusion, 101847.

Flynn, J. S., Giannetti, C., & Van Dijk, H. (2023). Anomaly Detection of DC Nut Runner Processes in Engine Assembly. AI, 4(1), 234–254.

Galván, E., & Mooney, P. (2021). Neuroevolution in deep neural networks: Current trends and future challenges. IEEE Transactions on Artificial Intelligence, 2(6), 476–493.

Gheewalla, F., McClelland, A., & Furnham, A. (2021). Effects of background noise and extraversion on reading comprehension performance. Ergonomics, 64(5), 593–599.

Goel, A., Goel, A. K., & Kumar, A. (2023). The role of artificial neural network and machine learning in utilizing spatial information. Spatial Information Research, 31(3), 275–285.

Hanifa, R. M., Isa, K., & Mohamad, S. (2021). A review on speaker recognition: Technology and challenges. Computers & Electrical Engineering, 90, 107005.

Jahangir, R., Teh, Y. W., Memon, N. A., Mujtaba, G., Zareei, M., Ishtiaq, U., … Ali, I. (2020). Text-independent speaker identification through feature fusion and deep neural network. IEEE Access, 8, 32187–32202.

Jahangir, R., Teh, Y. W., Nweke, H. F., Mujtaba, G., Al-Garadi, M. A., & Ali, I. (2021). Speaker identification through artificial intelligence techniques: A comprehensive review and research challenges. Expert Systems with Applications, 171, 114591.

Kabir, M. M., Mridha, M. F., Shin, J., Jahan, I., & Ohi, A. Q. (2021). A survey of speaker recognition: Fundamental theories, recognition methods and opportunities. IEEE Access, 9, 79236–79263.

Kami?ski, K. A., & Dobrowolski, A. P. (2022). Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features. Sensors, 22(23), 9370.

Khan, A. A., Laghari, A. A., Awan, S., & Jumani, A. K. (2021). Fourth industrial revolution application: network forensics cloud security issues. Security Issues and Privacy Concerns in Industry 4.0 Applications, 15–33.

Menghani, G. (2023). Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Computing Surveys, 55(12), 1–37.

Merzoug, M. A., Mostefaoui, A., Kechout, M. H., & Tamraoui, S. (2020). Deep learning for resource-limited devices. Proceedings of the 16th ACM Symposium on QoS and Security for Wireless and Mobile Networks, 81–87.

Musile, G., Agard, Y., Wang, L., De Palo, E. F., McCord, B., & Tagliaro, F. (2021). based microfluidic devices: On-site tools for crime scene investigation. TrAC Trends in Analytical Chemistry, 143, 116406.

Oviatt, S., & Cohen, P. R. (2022). The paradigm shift to multimodality in contemporary computer interfaces. Springer Nature.

Sadaf, M., Iqbal, Z., Javed, A. R., Saba, I., Krichen, M., Majeed, S., & Raza, A. (2023). Connected and Automated Vehicles: Infrastructure, Applications, Security, Critical Challenges, and Future Aspects. Technologies, 11(5), 117.

Saleem, M. H., Potgieter, J., & Arif, K. M. (2021). Automation in agriculture by machine and deep learning techniques: A review of recent developments. Precision Agriculture, 22, 2053–2091.

Serradilla, O., Zugasti, E., Rodriguez, J., & Zurutuza, U. (2022). Deep learning models for predictive maintenance: a survey, comparison, challenges and prospects. Applied Intelligence, 52(10), 10934–10964.

Sharma, S., & Mehra, R. (2020). Conventional machine learning and deep learning approach for multi-classification of breast cancer histopathology images—a comparative insight. Journal of Digital Imaging, 33, 632–654.

Singh, A., Kaur, N., Kukreja, V., Kadyan, V., & Kumar, M. (2022). Computational intelligence in processing of speech acoustics: a survey. Complex & Intelligent Systems, 8(3), 2623–2661.

Singh, N., & Sabrol, H. (2021). Convolutional neural networks-an extensive arena of deep learning. A comprehensive study. Archives of Computational Methods in Engineering, 28(7), 4755–4780.

Sisman, B., Yamagishi, J., King, S., & Li, H. (2020). An overview of voice conversion and its challenges: From statistical modeling to deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 132–157.

Srinivasa Murthy, Y. V, Koolagudi, S. G., & Jeshventh Raja, T. K. (2021). Singer identification for Indian singers using convolutional neural networks. International Journal of Speech Technology, 24, 781–796.

Toussaint, W., & Ding, A. Y. (2021). Sveva fair: A framework for evaluating fairness in speaker verification. ArXiv Preprint ArXiv:2107.12049.

Wang, J., Wang, J., Wang, S., & Zhang, Y. (2023). Deep learning in pediatric neuroimaging. Displays, 80, 102583.

Wang, Y., Zhang, T., Zhao, L., Hu, L., Wang, Z., Niu, Z., … others. (2023). RingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework. ArXiv Preprint ArXiv:2309.09003.

Wassink, A. B., Gansen, C., & Bartholomew, I. (2022). Uneven success: automatic speech recognition and ethnicity-related dialects. Speech Communication, 140, 50–70.

Yong, A.-P. C. (2022). The Mel-frequency cepstrum coefficient for music emotion recognition in machine learning. Macquarie University.

Zhao, J., Han, X., Ouyang, M., & Burke, A. F. (2023). Specialized deep neural networks for battery health prognostics: Opportunities and challenges. Journal of Energy Chemistry.

Downloads

ARTICLE Published HISTORY

Submitted Date: 2024-01-23
Accepted Date: 2024-01-23
Published Date: 2024-01-31

How to Cite

Airlangga, G. (2024). Evaluating the Efficacy of Traditional Machine Learning Models in Speaker Recognition: A Comparative Study Using the LibriSpeech Dataset. Brilliance: Research of Artificial Intelligence, 3(2), 449-455. https://doi.org/10.47709/brilliance.v3i2.3488