EmoFake: An Initial Dataset for Emotion Fake Audio Detection

Yan Zhao; Jiangyan Yi; Jianhua Tao; Chenglong Wang; Yongfeng Dong

EmoFake: An Initial Dataset for Emotion Fake Audio Detection

Yan Zhao, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Yongfeng Dong

Abstract

“To enhance the effectiveness of fake audio detection techniques, researchers have developed mul-tiple datasets such as those for the ASVspoof and ADD challenges. These datasets typically focuson capturing non-emotional characteristics in speech, such as the identity of the speaker and theauthenticity of the content. However, they often overlook changes in the emotional state of theaudio, which is another crucial dimension affecting the authenticity of speech. Therefore, thisstudy reports our progress in developing such an emotion fake audio detection dataset involvingchanging emotion state of the origin audio named EmoFake. The audio samples in EmoFake aregenerated using open-source emotional voice conversion models, intended to simulate potentialemotional tampering scenarios in real-world settings. We conducted a series of benchmark ex-periments on this dataset, and the results show that even advanced fake audio detection modelstrained on the ASVspoof 2019 LA dataset and the ADD 2022 track 3.2 dataset face challengeswith EmoFake. The EmoFake is publicly available1 now.”

Anthology ID:: 2024.ccl-1.99
Volume:: Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Month:: July
Year:: 2024
Address:: Taiyuan, China
Editors:: Sun Maosong, Liang Jiye, Han Xianpei, Liu Zhiyuan, He Yulan
Venue:: CCL
SIG:
Publisher:: Chinese Information Processing Society of China
Note:
Pages:: 1286–1297
Language:: English
URL:: https://aclanthology.org/2024.ccl-1.99/
DOI:
Bibkey:
Cite (ACL):: Yan Zhao, Jiangyan Yi, Jianhua Tao, Chenglong Wang, and Yongfeng Dong. 2024. EmoFake: An Initial Dataset for Emotion Fake Audio Detection. In Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference), pages 1286–1297, Taiyuan, China. Chinese Information Processing Society of China.
Cite (Informal):: EmoFake: An Initial Dataset for Emotion Fake Audio Detection (Zhao et al., CCL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.ccl-1.99.pdf

PDF Cite Search Fix data