Voice Spoofing Detection via Speech Rule Generation Using wav2vec 2.0-Based Attention

Qian-Bei Hong, Yu-Chen Gao, Yu-Ying Xiao, Yeou-Jiunn Chen, Kun-Yi Huang


Abstract
Recent advancements in AI-based voice cloning have led to increasingly convincing synthetic speech, posing significant threats to speaker verification systems. In this paper, we propose a novel voice spoofing detection method that integrates acoustic feature variations with attention mechanisms derived from wav2vec 2.0 representations. Unlike prior approaches that directly utilize wav2vec 2.0 features as model inputs, the proposed method leverages wav2vec 2.0 features to construct speech rules characteristic of bona-fide speech. Experimental results indicate that the proposed RULE-AASIST-L system significantly outperforms the baseline systems on the ASVspoof 2019 LA evaluation set, achieving a 24.6% relative reduction in equal error rate (EER) and an 10.8% reduction in minimum tandem detection cost function (min t-DCF). Ablation studies further confirm the importance of incorporating speech rules and selecting appropriate hidden layer representations. These findings highlight the potential of using self-supervised representations to guide rule-based modeling for robust spoofing detection.
Anthology ID:
2025.rocling-main.13
Volume:
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:
November
Year:
2025
Address:
National Taiwan University, Taipei City, Taiwan
Editors:
Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:
ROCLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
108–115
Language:
URL:
https://aclanthology.org/2025.rocling-main.13/
DOI:
Bibkey:
Cite (ACL):
Qian-Bei Hong, Yu-Chen Gao, Yu-Ying Xiao, Yeou-Jiunn Chen, and Kun-Yi Huang. 2025. Voice Spoofing Detection via Speech Rule Generation Using wav2vec 2.0-Based Attention. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 108–115, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Voice Spoofing Detection via Speech Rule Generation Using wav2vec 2.0-Based Attention (Hong et al., ROCLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.rocling-main.13.pdf