Advancing Voice Anti-Spoofing Systems: Self-Supervised Learning and Indonesian Dataset Integration for Enhanced Generalization
DOI:
10.47709/brilliance.v4i2.5182Keywords:
Automatic Speaker Verification (ASV), Spoofing Attacks, Self-Supervised Learning, Indonesian Dataset, Wav2vec 2.0Dimension Badge Record
Abstract
This study examines how self-supervised learning and a novel Indonesian language dataset enhance anti-spoofing systems. Results show improved model performance, with a lower Equal Error Rate (EER) during training, indicating effective learning from diverse audio samples. Using weighted cross-entropy analysis highlights the model's robustness in minimizing training errors. Comparisons with baseline models using English data reveal the proposed approach's superiority, achieving a significantly lower EER due to the incorporation of language-specific data. The unique phonetic features of Indonesian languages provide valuable training material, boosting the system's defence against spoofing attacks. The dataset improves generalization across dialects and recording conditions by including diverse speech samples. This integration enhances the anti-spoofing systems' adaptability, which is vital for real-world applications where recording variability affects performance. The experimental setup used a balanced dataset of genuine and spoofed utterances from male and female speakers, ensuring high-quality input. The training configuration splits the dataset into training, development, and testing sets on a high-performance computing setup. Results showed the proposed model achieved an EER of 0.33, compared to 7.65 for the traditional sinc-layer model and 0.82 for the wav2vec 2.0 model with English data. Overall, this research advances anti-spoofing solutions and emphasizes the need for diverse datasets and advanced learning approaches to improve automatic speaker verification systems in practical applications. The incorporation of the Indonesian dataset is vital for addressing linguistic diversity challenges in biometric security, paving the way for future advancements in this area.
Abstract viewed = 56 times
References
Babu, A., Wang, C., Tjandra, A., Lakhotia, K., Xu, Q., Goyal, N., Singh, K., Von Platen, P., Saraf, Y., Pino, J., Baevski, A., Conneau, A., & Auli, M. (2022). XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale. Interspeech 2022, 2278–2282. https://doi.org/10.21437/Interspeech.2022-143
Das, R. K., Yang, J., & Li, H. (2021). Data Augmentation with Signal Companding for Detection of Logical Access Attacks. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6349–6353. https://doi.org/10.1109/ICASSP39728.2021.9413501
Hoang, V., Pham, V. T., Xuan, H. N., Nhi, P., Dat, P., & Nguyen, T. T. T. (2024). VSASV: A Vietnamese Dataset for Spoofing-Aware Speaker Verification. Interspeech 2024, 4288–4292. Interspeech 2024. https://doi.org/10.21437/Interspeech.2024-1972
Ito, A., & Horiguchi, S. (2023). Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model. https://doi.org/10.48550/ARXIV.2305.15518
Jung, J., Heo, H.-S., Tak, H., Shim, H., Chung, J. S., Lee, B.-J., Yu, H.-J., & Evans, N. (2021). AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks (No. arXiv:2110.01200). arXiv. https://doi.org/10.48550/arXiv.2110.01200
Keresh, A., & Shamoi, P. (2024). Liveness Detection in Computer Vision: Transformer-based Self-Supervised Learning for Face Anti-Spoofing. IEEE Access, 1–1. https://doi.org/10.1109/ACCESS.2024.3513795
Khan, A., Malik, K. M., & Nawaz, S. (2023). Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified Spoofing Detection. https://doi.org/10.48550/ARXIV.2309.09837
Kinnunen, T. H., Lee, K. A., Tak, H., Evans, N., & Nautsch, A. (2024). t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5), 2622–2637. https://doi.org/10.1109/TPAMI.2023.3313648
Lee, Y., Kim, N., Jeong, J., & Kwak, I.-Y. (2023). Experimental Case Study of Self-Supervised Learning for Voice Spoofing Detection. IEEE Access, 11, 24216–24226. https://doi.org/10.1109/ACCESS.2023.3254880
Lin, H., Ai, Y., & Ling, Z. (2022). A Light CNN with Split Batch Normalization for Spoofed Speech Detection Using Data Augmentation. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 1684–1689. 2022 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). https://doi.org/10.23919/APSIPAASC55919.2022.9980260
Liu, T., Kukanov, I., Pan, Z., Wang, Q., Sailor, H. B., & Lee, K. A. (2024). Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing. https://doi.org/10.48550/ARXIV.2409.08346
Monteiro, J., Alam, J., & Falk, T. H. (2020). Generalized end-to-end detection of spoofing attacks to automatic speaker recognizers. Computer Speech & Language, 63, 101096. https://doi.org/10.1016/j.csl.2020.101096
Müller, N. M., Kawa, P., Choong, W. H., Casanova, E., Gölge, E., Müller, T., Syga, P., Sperl, P., & Böttinger, K. (2024). MLAAD: The Multi-Language Audio Anti-Spoofing Dataset. https://doi.org/10.48550/ARXIV.2401.09512
Prihasto, B., & Azhar, N. F. (2021). Evaluation of Recurrent Neural Network Based on Indonesian Speech Synthesis for Small Datasets. 17–25. https://doi.org/10.4028/www.scientific.net/AST.104.17
Resemble AI - Custom AI Generated Voices. (2024). https://app.resemble.ai/
Tak, H., Kamble, M., Patino, J., Todisco, M., & Evans, N. (2022). Rawboost: A Raw Data Boosting and Augmentation Method Applied to Automatic Speaker Verification Anti-Spoofing. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6382–6386. https://doi.org/10.1109/ICASSP43922.2022.9746213
Tak, H., Todisco, M., Wang, X., Jung, J., Yamagishi, J., & Evans, N. (2022). Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation (No. arXiv:2202.12233). arXiv. https://doi.org/10.48550/arXiv.2202.12233
Tran, N., Prihasto, B., Le, P. T., Tran, T., Lu, C.-S., & Wang, J.-C. (2024). EVA-ASCA: Enhancing Voice Anti-Spoofing through Attention-based Similarity Weights and Contrastive Negative Attractors. 537–540.
Wu, H., Kuo, H.-C., Zheng, N., Hung, K.-H., Lee, H.-Y., Tsao, Y., Wang, H.-M., & Meng, H. (2022). Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 9236–9240. https://doi.org/10.1109/ICASSP43922.2022.9746162
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015). Spoofing and countermeasures for speaker verification: A survey. Speech Communication, 66, 130–153. https://doi.org/10.1016/j.specom.2014.10.005
Wu, Z., Yamagishi, J., Kinnunen, T., Hanilçi, C., Sahidullah, M., Sizov, A., Evans, N., Todisco, M., & Delgado, H. (2017). ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge. IEEE Journal of Selected Topics in Signal Processing, 11(4), 588–604. IEEE Journal of Selected Topics in Signal Processing. https://doi.org/10.1109/JSTSP.2017.2671435
Downloads
ARTICLE Published HISTORY
How to Cite
Issue
Section
License
Copyright (c) 2024 Bima Prihasto, Mifta Nur Farid, Rafid Al Khairy

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.