Analysis of Machine Learning Classifiers for Speaker Identification: A Study on SVM, Random Forest, KNN, and Decision Tree
DOI:
10.47709/cnahpc.v6i1.3487Keywords:
Speaker Identification, Machine Learning, SVM, Comparison, Random ForestDimension Badge Record
Abstract
This study investigates the performance of machine learning classifiers in the domain of speaker identification, a pivotal component of modern digital security systems. With the burgeoning integration of voice-activated interfaces in technology, the demand for accurate and reliable speaker identification is paramount. This research provides a comprehensive comparison of four widely used classifiers: Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), and Decision Tree (DT). Utilizing the LibriSpeech dataset, known for its diversity of speakers and recording conditions, we extracted Mel-frequency cepstral coefficients (MFCCs) to serve as features for training and evaluating the classifiers. Each model's performance was assessed based on precision, recall, F1-score, and accuracy. The results revealed that RF outperformed all other classifiers, achieving near-perfect metrics, indicative of its robustness and generalizability for speaker identification tasks. KNN also demonstrated high performance, suggesting its suitability for applications where rapid execution and interpretability are critical. Conversely, SVM and DT, while yielding moderate and lower performances respectively, highlighted the necessity for further optimization. These findings underscore the effectiveness of ensemble and distance-based classifiers in handling complex patterns for speaker differentiation. The study not only guides the selection of appropriate classifiers for speaker identification but also sets the stage for future research, which could explore hybrid models and the impact of dataset variability on performance. The insights from this analysis contribute significantly to the field, providing a benchmark for developing advanced speaker identification systems
Downloads
Abstract viewed = 159 times
References
Abdul, Z. K., & Al-Talabani, A. K. (2022). Mel Frequency Cepstral Coefficient and its applications: A Review. IEEE Access.
Abdullah, H., Warren, K., Bindschaedler, V., Papernot, N., & Traynor, P. (2021). Sok: The faults in our asrs: An overview of attacks against automatic speech recognition and speaker identification systems. 2021 IEEE Symposium on Security and Privacy (SP), 730–747.
Alwahedi, F., Aldhaheri, A., Ferrag, M. A., Battah, A., & Tihanyi, N. (2024). Machine learning techniques for IoT security: Current research and future vision with generative AI and large language models. Internet of Things and Cyber-Physical Systems.
Ayesha, S., Hanif, M. K., & Talib, R. (2020). Overview and comparative study of dimensionality reduction techniques for high dimensional data. Information Fusion, 59, 44–58.
Ayres, L. B., Gomez, F. J. V, Linton, J. R., Silva, M. F., & Garcia, C. D. (2021). Taking the leap between analytical chemistry and artificial intelligence: A tutorial review. Analytica Chimica Acta, 1161, 338403.
Bai, Y., Lu, L., Cheng, J., Liu, J., Chen, Y., & Yu, J. (2020). Acoustic-based sensing and applications: A survey. Computer Networks, 181, 107447.
Baker, F. C. (2021). Inadequacy of Existing Security Management Frameworks in Addressing Internet of Things (IoT) Cybersecurity-Related Risks. Northcentral University.
Biswas, M., Rahaman, S., Ahmadian, A., Subari, K., & Singh, P. K. (2023). Automatic spoken language identification using MFCC based time series features. Multimedia Tools and Applications, 82(7), 9565–9595.
Chadha, A., Abdullah, A., Angeline, L., & Sivanesan, S. (2021). A review on state-of-the-art Automatic Speaker verification system from spoofing and anti-spoofing perspective. Indian Journal of Science and Technology, 14(40), 3026–3050.
Costa, V. G., & Pedreira, C. E. (2023). Recent advances in decision trees: An updated survey. Artificial Intelligence Review, 56(5), 4765–4800.
Devi, K. J., Singh, N. H., & Thongam, K. (2020). Automatic speaker recognition from speech signals using self organizing feature map and hybrid neural network. Microprocessors and Microsystems, 79, 103264.
Dhal, P., & Azad, C. (2022). A comprehensive survey on feature selection in the various fields of machine learning. Applied Intelligence, 1–39.
Divya, S., Panda, S., Hajra, S., Jeyaraj, R., Paul, A., Park, S. H., … Oh, T. H. (2023). Smart data processing for energy harvesting systems using artificial intelligence. Nano Energy, 106, 108084.
Ezugwu, A. E., Ikotun, A. M., Oyelade, O. O., Abualigah, L., Agushaka, J. O., Eke, C. I., & Akinyelu, A. A. (2022). A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Engineering Applications of Artificial Intelligence, 110, 104743.
Farhood, H., Saberi, M., & Najafi, M. (2021). Improving object recognition in crime scenes via local interpretable model-agnostic explanations. 2021 IEEE 25th International Enterprise Distributed Object Computing Workshop (EDOCW), 90–94.
Gumbs, A. A., Frigerio, I., Spolverato, G., Croner, R., Illanes, A., Chouillard, E., & Elyan, E. (2021). Artificial intelligence surgery: How do we get to autonomous actions in surgery? Sensors, 21(16), 5526.
Ibrahim, I., & Abdulazeez, A. (2021). The role of machine learning algorithms for diagnosing diseases. Journal of Applied Science and Technology Trends, 2(01), 10–19.
Jahangir, R., Teh, Y. W., Memon, N. A., Mujtaba, G., Zareei, M., Ishtiaq, U., … Ali, I. (2020). Text-independent speaker identification through feature fusion and deep neural network. IEEE Access, 8, 32187–32202.
Jahangir, R., Teh, Y. W., Nweke, H. F., Mujtaba, G., Al-Garadi, M. A., & Ali, I. (2021). Speaker identification through artificial intelligence techniques: A comprehensive review and research challenges. Expert Systems with Applications, 171, 114591.
Jaid, U. H., & AbdulHassan, A. K. (2023). Optimizing Acoustic Feature Selection for Estimating Speaker Traits: A Novel Threshold-Based Approach. Traitement Du Signal, 40(6).
Kourkounakis, T., Hajavi, A., & Etemad, A. (2021). Fluentnet: End-to-end detection of stuttered speech disfluencies with deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2986–2999.
Kumar, C. J., & Das, P. R. (2022). The diagnosis of ASD using multiple machine learning techniques. International Journal of Developmental Disabilities, 68(6), 973–983.
Liu, T., Gao, M., Lin, F., Wang, C., Ba, Z., Han, J., … Ren, K. (2021). Wavoice: A noise-resistant multi-modal speech recognition system fusing mmwave and audio signals. Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, 97–110.
Mahadevkar, S. V, Khemani, B., Patil, S., Kotecha, K., Vora, D. R., Abraham, A., & Gabralla, L. A. (2022). A review on machine learning styles in computer vision—Techniques and future directions. Ieee Access, 10, 107293–107329.
McDaniel, P., & Koushanfar, F. (2023). Secure and Trustworthy Computing 2.0 Vision Statement. ArXiv Preprint ArXiv:2308.00623.
Mitra, S., Lakshmi, D., & Govindaraj, V. (2023). Data Analysis and Machine Learning in AI-Assisted Special Education for Students With Exceptional Needs. In AI-Assisted Special Education for Students With Exceptional Needs (pp. 67–109). IGI Global.
Moshayedi, A. J., Roy, A. S., Kolahdooz, A., & Shuxin, Y. (2022). Deep learning application pros and cons over algorithm deep learning application pros and cons over algorithm. EAI Endorsed Transactions on AI and Robotics, 1(1).
Nainan, S., & Kulkarni, V. (2021). Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN. International Journal of Speech Technology, 24, 809–822.
Nassar, A., & Kamal, M. (2021). Machine Learning and Big Data analytics for Cybersecurity Threat Detection: A Holistic review of techniques and case studies. Journal of Artificial Intelligence and Machine Learning in Management, 5(1), 51–63.
Nguyen, M., Chen, Y., Nguyen, T. H., Habashi, S. S., Quach, S., & Thaichon, P. (2022). Artificial intelligence (AI)-driven services. Artificial Intelligence for Marketing Management.
Nossier, S. A., Wall, J., Moniri, M., Glackin, C., & Cannings, N. (2020). An experimental analysis of deep learning architectures for supervised speech enhancement. Electronics, 10(1), 17.
Nti, I. K., Zaman, A., Nyarko-Boateng, O., Adekoya, A. F., & Keyeremeh, F. (2023). A predictive analytics model for crop suitability and productivity with tree-based ensemble learning. Decision Analytics Journal, 8, 100311.
Ogonji, M. M., Okeyo, G., & Wafula, J. M. (2020). A survey on privacy and security of Internet of Things. Computer Science Review, 38, 100312.
Park, D. S., Zhang, Y., Chiu, C.-C., Chen, Y., Li, B., Chan, W., … Wu, Y. (2020). Specaugment on large scale datasets. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6879–6883.
Pataranutaporn, P., Danry, V., Leong, J., Punpongsanon, P., Novy, D., Maes, P., & Sra, M. (2021). AI-generated characters for supporting personalized learning and well-being. Nature Machine Intelligence, 3(12), 1013–1022.
Patel, K. (2023). Credit Card Analytics: A Review of Fraud Detection and Risk Assessment Techniques. International Journal of Computer Trends and Technology, 71(10), 69–79.
Piquero, N. L., Piquero, A. R., Gies, S., Green, B., Bobnis, A., & Velasquez, E. (2022). Preventing identity theft: perspectives on technological solutions from industry insiders. In The New Technology of Financial Crime (pp. 163–182). Routledge.
Ren, W., Li, X., Zheng, D., Zeng, R., Su, J., Mu, T., & Wang, Y. (2023). Enhancing Flood Simulation in Data-Limited Glacial River Basins through Hybrid Modeling and Multi-Source Remote Sensing Data. Remote Sensing, 15(18), 4527.
Roslan, F. A. B. M., & Ahmad, N. B. (2023). The Rise of AI-Powered Voice Assistants: Analyzing Their Transformative Impact on Modern Customer Service Paradigms and Consumer Expectations. Quarterly Journal of Emerging Technologies and Innovations, 8(3), 33–64.
Sheykhmousa, M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., & Homayouni, S. (2020). Support vector machine versus random forest for remote sensing image classification: A meta-analysis and systematic review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 6308–6325.
Shome, N., Sarkar, A., Ghosh, A. K., Laskar, R. H., & Kashyap, R. (2023). Speaker Recognition through Deep Learning Techniques: A Comprehensive Review and Research Challenges. Periodica Polytechnica Electrical Engineering and Computer Science.
Tan, C. B., Hijazi, M. H. A., Khamis, N., Nohuddin, P. N. E. B., Zainol, Z., Coenen, F., & Gani, A. (2021). A survey on presentation attack detection for automatic speaker verification systems: State-of-the-art, taxonomy, issues and future direction. Multimedia Tools and Applications, 80(21–23), 32725–32762.
Tan, H., Wang, L., Zhang, H., Zhang, J., Shafiq, M., & Gu, Z. (2022). Adversarial attack and defense strategies of speaker recognition systems: A survey. Electronics, 11(14), 2183.
Turner, H. (2021). Security and privacy in speaker recognition systems. University of Oxford.
Urbanowicz, R., Zhang, R., Cui, Y., & Suri, P. (2023). STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison. In Genetic Programming Theory and Practice XIX (pp. 201–231). Springer.
Verma, V., Benjwal, A., Chhabra, A., Singh, S. K., Kumar, S., Gupta, B. B., … Chui, K. T. (2023). A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection. Scientific Reports, 13(1), 22719.
Wen, X., Xie, Y., Jiang, L., Pu, Z., & Ge, T. (2021). Applications of machine learning methods in traffic crash severity modelling: current status and future directions. Transport Reviews, 41(6), 855–879.
Zhang, C., Liu, Y., & Tie, N. (2023). Forest Land Resource Information Acquisition with Sentinel-2 Image Utilizing Support Vector Machine, K-Nearest Neighbor, Random Forest, Decision Trees and Multi-Layer Perceptron. Forests, 14(2), 254.
Downloads
ARTICLE Published HISTORY
How to Cite
Issue
Section
License
Copyright (c) 2023 Gregorius Airlangga
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.