Sifan Fang
2025
CMI-AIGCX at GenAI Detection Task 2: Leveraging Multilingual Proxy LLMs for Machine-Generated Text Detection in Academic Essays
Kaijie Jiao
|
Xingyu Yao
|
Shixuan Ma
|
Sifan Fang
|
Zikang Guo
|
Benfeng Xu
|
Licheng Zhang
|
Quan Wang
|
Yongdong Zhang
|
Zhendong Mao
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
This paper presents the approach we proposed for GenAI Detection Task 2, which aims to classify a given text as either machine-generated or human-written, with a particular emphasis on academic essays. We participated in subtasks A and B, which focus on detecting English and Arabic essays, respectively. We propose a simple and efficient method for detecting machine-generated essays, where we use the Llama-3.1-8B as a proxy to capture the essence of each token in the text. These essences are processed and classified using a refined feature classification network. Our approach does not require fine-tuning the LLM. Instead, we leverage its extensive multilingual knowledge acquired during pretraining to significantly enhance detection performance. The results validate the effectiveness of our approach and demonstrate that leveraging a proxy model with diverse multilingual knowledge can significantly enhance the detection of machine-generated text across multiple languages, regardless of model size. In Subtask A, we achieved an F1 score of 99.9%, ranking first out of 26 teams. In Subtask B, we achieved an F1 score of 96.5%, placing fourth out of 22 teams, with the same score as the third-place team.
Search
Fix data
Co-authors
- Zikang Guo 1
- Kaijie Jiao 1
- Shixuan Ma 1
- Zhendong Mao 1
- Quan Wang 1
- show all...