Mengxin Ren
2024
Cross-Domain Audio Deepfake Detection: Dataset and Analysis
Yuang Li
|
Min Zhang
|
Mengxin Ren
|
Xiaosong Qiao
|
Miaomiao Ma
|
Daimeng Wei
|
Hao Yang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Audio deepfake detection (ADD) is essential for preventing the misuse of synthetic voices that may infringe on personal rights and privacy. Recent zero-shot text-to-speech (TTS) models pose higher risks as they can clone voices with a single utterance. However, the existing ADD datasets are outdated, leading to suboptimal generalization of detection models. In this paper, we construct a new cross-domain ADD dataset comprising over 300 hours of speech data that is generated by five advanced zero-shot TTS models. To simulate real-world scenarios, we employ diverse attack methods and audio prompts from different datasets. Experiments show that, through novel attack-augmented training, the Wav2Vec2-large and Whisper-medium models achieve equal error rates of 4.1% and 6.5% respectively. Additionally, we demonstrate our models’ outstanding few-shot ADD ability by fine-tuning with just one minute of target-domain data. Nonetheless, neural codec compressors greatly affect the detection accuracy, necessitating further research. Our dataset is publicly available (https://github.com/leolya/CD-ADD).
HW-TSC at SemEval-2024 Task 9: Exploring Prompt Engineering Strategies for Brain Teaser Puzzles Through LLMs
Yinglu Li
|
Zhao Yanqing
|
Min Zhang
|
Yadong Deng
|
Aiju Geng
|
Xiaoqin Liu
|
Mengxin Ren
|
Yuang Li
|
Su Chang
|
Xiaofeng Zhao
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Large Language Models (LLMs) have demonstrated impressive performance on many Natural Language Processing (NLP) tasks. However, their ability to solve more creative, lateral thinking puzzles remains relatively unexplored. In this work, we develop methods to enhance the lateral thinking and puzzle-solving capabilities of LLMs. We curate a dataset of word-type and sentence-type brain teasers requiring creative problem-solving abilities beyond commonsense reasoning. We first evaluate the zero-shot performance of models like GPT-3.5 and GPT-4 on this dataset. To improve their puzzle-solving skills, we employ prompting techniques like providing reasoning clues and chaining multiple examples to demonstrate the desired thinking process. We also fine-tune the state-of-the-art Mixtral 7x8b LLM on ourdataset. Our methods enable the models to achieve strong results, securing 2nd and 3rd places in the brain teaser task. Our work highlights the potential of LLMs in acquiring complex reasoning abilities with the appropriate training. The efficacy of our approaches opens up new research avenues into advancing lateral thinking and creative problem-solving with AI systems.
Search
Co-authors
- Yuang Li 2
- Min Zhang 2
- Xiaosong Qiao 1
- Miaomiao Ma 1
- Daimeng Wei 1
- show all...