ac

Advancing Voice Anti-Spoofing Systems: Self-Supervised Learning and Indonesian Dataset Integration for Enhanced Generalization

Authors

  • Bima Prihasto Institut Teknologi Kalimantan, Indonesia
  • Mifta Nur Farid Institut Teknologi Kalimantan, Indonesia
  • Rafid Al Khairy Institut Teknologi Kalimantan, Indonesia

DOI:

10.47709/brilliance.v4i2.5182

Keywords:

Automatic Speaker Verification (ASV), Spoofing Attacks, Self-Supervised Learning, Indonesian Dataset, Wav2vec 2.0

Dimension Badge Record



Abstract

This study examines how self-supervised learning and a novel Indonesian language dataset enhance anti-spoofing systems. Results show improved model performance, with a lower Equal Error Rate (EER) during training, indicating effective learning from diverse audio samples. Using weighted cross-entropy analysis highlights the model's robustness in minimizing training errors. Comparisons with baseline models using English data reveal the proposed approach's superiority, achieving a significantly lower EER due to the incorporation of language-specific data. The unique phonetic features of Indonesian languages provide valuable training material, boosting the system's defence against spoofing attacks. The dataset improves generalization across dialects and recording conditions by including diverse speech samples. This integration enhances the anti-spoofing systems' adaptability, which is vital for real-world applications where recording variability affects performance. The experimental setup used a balanced dataset of genuine and spoofed utterances from male and female speakers, ensuring high-quality input. The training configuration splits the dataset into training, development, and testing sets on a high-performance computing setup. Results showed the proposed model achieved an EER of 0.33, compared to 7.65 for the traditional sinc-layer model and 0.82 for the wav2vec 2.0 model with English data. Overall, this research advances anti-spoofing solutions and emphasizes the need for diverse datasets and advanced learning approaches to improve automatic speaker verification systems in practical applications. The incorporation of the Indonesian dataset is vital for addressing linguistic diversity challenges in biometric security, paving the way for future advancements in this area.

Author Biography

Bima Prihasto, Institut Teknologi Kalimantan, Indonesia

My name is Bima Prihasto, and I am currently engaged as a lecturer and researcher at a higher education institution in East Kalimantan. I earned my PhD from the National Central University Taiwan, specialising in computer science. My research focuses on machine learning and deep learning, particularly their applications in speech and image processing. I have served as a reviewer for the reputable IEEE Access Journal and the esteemed IEEE ICASSP Conference, which boasts the highest h-index. Furthermore, I have previously participated in collaborative research at Academia Sinica Taiwan.

Google Scholar Cite Analysis
Abstract viewed = 56 times

References

Babu, A., Wang, C., Tjandra, A., Lakhotia, K., Xu, Q., Goyal, N., Singh, K., Von Platen, P., Saraf, Y., Pino, J., Baevski, A., Conneau, A., & Auli, M. (2022). XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale. Interspeech 2022, 2278–2282. https://doi.org/10.21437/Interspeech.2022-143

Das, R. K., Yang, J., & Li, H. (2021). Data Augmentation with Signal Companding for Detection of Logical Access Attacks. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6349–6353. https://doi.org/10.1109/ICASSP39728.2021.9413501

Hoang, V., Pham, V. T., Xuan, H. N., Nhi, P., Dat, P., & Nguyen, T. T. T. (2024). VSASV: A Vietnamese Dataset for Spoofing-Aware Speaker Verification. Interspeech 2024, 4288–4292. Interspeech 2024. https://doi.org/10.21437/Interspeech.2024-1972

Ito, A., & Horiguchi, S. (2023). Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model. https://doi.org/10.48550/ARXIV.2305.15518

Jung, J., Heo, H.-S., Tak, H., Shim, H., Chung, J. S., Lee, B.-J., Yu, H.-J., & Evans, N. (2021). AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks (No. arXiv:2110.01200). arXiv. https://doi.org/10.48550/arXiv.2110.01200

Keresh, A., & Shamoi, P. (2024). Liveness Detection in Computer Vision: Transformer-based Self-Supervised Learning for Face Anti-Spoofing. IEEE Access, 1–1. https://doi.org/10.1109/ACCESS.2024.3513795

Khan, A., Malik, K. M., & Nawaz, S. (2023). Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified Spoofing Detection. https://doi.org/10.48550/ARXIV.2309.09837

Kinnunen, T. H., Lee, K. A., Tak, H., Evans, N., & Nautsch, A. (2024). t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5), 2622–2637. https://doi.org/10.1109/TPAMI.2023.3313648

Lee, Y., Kim, N., Jeong, J., & Kwak, I.-Y. (2023). Experimental Case Study of Self-Supervised Learning for Voice Spoofing Detection. IEEE Access, 11, 24216–24226. https://doi.org/10.1109/ACCESS.2023.3254880

Lin, H., Ai, Y., & Ling, Z. (2022). A Light CNN with Split Batch Normalization for Spoofed Speech Detection Using Data Augmentation. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 1684–1689. 2022 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). https://doi.org/10.23919/APSIPAASC55919.2022.9980260

Liu, T., Kukanov, I., Pan, Z., Wang, Q., Sailor, H. B., & Lee, K. A. (2024). Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing. https://doi.org/10.48550/ARXIV.2409.08346

Monteiro, J., Alam, J., & Falk, T. H. (2020). Generalized end-to-end detection of spoofing attacks to automatic speaker recognizers. Computer Speech & Language, 63, 101096. https://doi.org/10.1016/j.csl.2020.101096

Müller, N. M., Kawa, P., Choong, W. H., Casanova, E., Gölge, E., Müller, T., Syga, P., Sperl, P., & Böttinger, K. (2024). MLAAD: The Multi-Language Audio Anti-Spoofing Dataset. https://doi.org/10.48550/ARXIV.2401.09512

Prihasto, B., & Azhar, N. F. (2021). Evaluation of Recurrent Neural Network Based on Indonesian Speech Synthesis for Small Datasets. 17–25. https://doi.org/10.4028/www.scientific.net/AST.104.17

Resemble AI - Custom AI Generated Voices. (2024). https://app.resemble.ai/

Tak, H., Kamble, M., Patino, J., Todisco, M., & Evans, N. (2022). Rawboost: A Raw Data Boosting and Augmentation Method Applied to Automatic Speaker Verification Anti-Spoofing. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6382–6386. https://doi.org/10.1109/ICASSP43922.2022.9746213

Tak, H., Todisco, M., Wang, X., Jung, J., Yamagishi, J., & Evans, N. (2022). Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation (No. arXiv:2202.12233). arXiv. https://doi.org/10.48550/arXiv.2202.12233

Tran, N., Prihasto, B., Le, P. T., Tran, T., Lu, C.-S., & Wang, J.-C. (2024). EVA-ASCA: Enhancing Voice Anti-Spoofing through Attention-based Similarity Weights and Contrastive Negative Attractors. 537–540.

Wu, H., Kuo, H.-C., Zheng, N., Hung, K.-H., Lee, H.-Y., Tsao, Y., Wang, H.-M., & Meng, H. (2022). Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 9236–9240. https://doi.org/10.1109/ICASSP43922.2022.9746162

Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015). Spoofing and countermeasures for speaker verification: A survey. Speech Communication, 66, 130–153. https://doi.org/10.1016/j.specom.2014.10.005

Wu, Z., Yamagishi, J., Kinnunen, T., Hanilçi, C., Sahidullah, M., Sizov, A., Evans, N., Todisco, M., & Delgado, H. (2017). ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge. IEEE Journal of Selected Topics in Signal Processing, 11(4), 588–604. IEEE Journal of Selected Topics in Signal Processing. https://doi.org/10.1109/JSTSP.2017.2671435

Downloads

ARTICLE Published HISTORY

Submitted Date: 2024-12-27
Accepted Date: 2024-12-28
Published Date: 2025-01-13

How to Cite

Prihasto, B., Nur Farid, M. ., & Al Khairy, R. . (2025). Advancing Voice Anti-Spoofing Systems: Self-Supervised Learning and Indonesian Dataset Integration for Enhanced Generalization. Brilliance: Research of Artificial Intelligence, 4(2), 890-900. https://doi.org/10.47709/brilliance.v4i2.5182