Cross-Domain Audio Deepfake Detection: Dataset and Analysis

Yuang Li; Min Zhang; Mengxin Ren; Xiaosong Qiao; Miaomiao Ma; Daimeng Wei; Hao Yang

doi:10.18653/v1/2024.emnlp-main.286

Cross-Domain Audio Deepfake Detection: Dataset and Analysis

Yuang Li, Min Zhang, Mengxin Ren, Xiaosong Qiao, Miaomiao Ma, Daimeng Wei, Hao Yang

Abstract

Audio deepfake detection (ADD) is essential for preventing the misuse of synthetic voices that may infringe on personal rights and privacy. Recent zero-shot text-to-speech (TTS) models pose higher risks as they can clone voices with a single utterance. However, the existing ADD datasets are outdated, leading to suboptimal generalization of detection models. In this paper, we construct a new cross-domain ADD dataset comprising over 300 hours of speech data that is generated by five advanced zero-shot TTS models. To simulate real-world scenarios, we employ diverse attack methods and audio prompts from different datasets. Experiments show that, through novel attack-augmented training, the Wav2Vec2-large and Whisper-medium models achieve equal error rates of 4.1% and 6.5% respectively. Additionally, we demonstrate our models’ outstanding few-shot ADD ability by fine-tuning with just one minute of target-domain data. Nonetheless, neural codec compressors greatly affect the detection accuracy, necessitating further research. Our dataset is publicly available (https://github.com/leolya/CD-ADD).

Anthology ID:: 2024.emnlp-main.286
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4977–4983
Language:
URL:: https://aclanthology.org/2024.emnlp-main.286/
DOI:: 10.18653/v1/2024.emnlp-main.286
Bibkey:
Cite (ACL):: Yuang Li, Min Zhang, Mengxin Ren, Xiaosong Qiao, Miaomiao Ma, Daimeng Wei, and Hao Yang. 2024. Cross-Domain Audio Deepfake Detection: Dataset and Analysis. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 4977–4983, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Cross-Domain Audio Deepfake Detection: Dataset and Analysis (Li et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.286.pdf

PDF Cite Search Fix data