Kun-Yi Huang
2025
Voice Spoofing Detection via Speech Rule Generation Using wav2vec 2.0-Based Attention
Qian-Bei Hong
|
Yu-Chen Gao
|
Yu-Ying Xiao
|
Yeou-Jiunn Chen
|
Kun-Yi Huang
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Recent advancements in AI-based voice cloning have led to increasingly convincing synthetic speech, posing significant threats to speaker verification systems. In this paper, we propose a novel voice spoofing detection method that integrates acoustic feature variations with attention mechanisms derived from wav2vec 2.0 representations. Unlike prior approaches that directly utilize wav2vec 2.0 features as model inputs, the proposed method leverages wav2vec 2.0 features to construct speech rules characteristic of bona-fide speech. Experimental results indicate that the proposed RULE-AASIST-L system significantly outperforms the baseline systems on the ASVspoof 2019 LA evaluation set, achieving a 24.6% relative reduction in equal error rate (EER) and an 10.8% reduction in minimum tandem detection cost function (min t-DCF). Ablation studies further confirm the importance of incorporating speech rules and selecting appropriate hidden layer representations. These findings highlight the potential of using self-supervised representations to guide rule-based modeling for robust spoofing detection.