Utilizing an Ensemble Model with Anomalous Label Smoothing to Detect Generated Scientific Papers

Yuan Zhao, Junruo Gao, Junlin Wang, Gang Luo, Liang Tang


Abstract
Generative AI, as it becomes increasingly integrated into our lives, has brought convenience, though some concerns have arisen regarding its potential impact on the rigor and authenticity of scientific research. To encourage the development of robust and reliable automatically-generated scientific text detection systems, the “DAGPap24: Detecting Automatically Generated Scientific Papers” competition was held and shared the same task with the 4th Workshop on Scholarly Document Processing (SDP 2024) to be held at ACL 2024. In the DAGPap24 competition, participants were tasked with constructing a generative text detection model that could accurately distinguish between the human written fragment, the synonym replacement fragment, the ChatGPT rewrite fragment, and the generated summary fragment of a paper. In this competition, we first conducted a comprehensive analysis of the training set to build a generative paper detection model. Then we tried various language models, including SciBERT, ALBERT, DeBERTa, RoBERTa, etc. After that, we introduced an Anomalous Label Smoothing (ALS) method and a majority voting method to improve the final results. Finally, we achieved 0.9948 and 0.9944 F1 scores during the development and testing phases respectively, and we achieved second place in the competition.
Anthology ID:
2024.sdp-1.12
Volume:
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Tirthankar Ghosal, Amanpreet Singh, Anita Waard, Philipp Mayr, Aakanksha Naik, Orion Weller, Yoonjoo Lee, Shannon Shen, Yanxia Qin
Venues:
sdp | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
130–134
Language:
URL:
https://aclanthology.org/2024.sdp-1.12
DOI:
Bibkey:
Cite (ACL):
Yuan Zhao, Junruo Gao, Junlin Wang, Gang Luo, and Liang Tang. 2024. Utilizing an Ensemble Model with Anomalous Label Smoothing to Detect Generated Scientific Papers. In Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024), pages 130–134, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Utilizing an Ensemble Model with Anomalous Label Smoothing to Detect Generated Scientific Papers (Zhao et al., sdp-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sdp-1.12.pdf