LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Mervat Abassy; Kareem Elozeiri; Alexander Aziz; Minh Ngoc Ta; Raj Vardhan Tomar; Bimarsha Adhikari; Saad El Dine Ahmed; Yuxia Wang; Osama Mohammed Afzal; Zhuohan Xie; Jonibek Mansurov; Ekaterina Artemova; Vladislav Mikhailov; Rui Xing; Jiahui Geng; Hasan Iqbal; Zain Muhammad Mujahid; Tarek Mahmoud; Akim Tsvigun; Alham Fikri Aji; Artem Shelmanov; Nizar Habash; Iryna Gurevych; Preslav Nakov

LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Mervat Abassy, Kareem Elozeiri, Alexander Aziz, Minh Ngoc Ta, Raj Vardhan Tomar, Bimarsha Adhikari, Saad El Dine Ahmed, Yuxia Wang, Osama Mohammed Afzal, Zhuohan Xie, Jonibek Mansurov, Ekaterina Artemova, Vladislav Mikhailov, Rui Xing, Jiahui Geng, Hasan Iqbal, Zain Muhammad Mujahid, Tarek Mahmoud, Akim Tsvigun, Alham Fikri Aji, Artem Shelmanov, Nizar Habash, Iryna Gurevych, Preslav Nakov

Abstract

The ease of access to large language models (LLMs) has enabled a widespread of machine-generated texts, and now it is often hard to tell whether a piece of text was human-written or machine-generated. This raises concerns about potential misuse, particularly within educational and academic domains. Thus, it is important to develop practical systems that can automate the process. Here, we present one such system, LLM-DetectAIve, designed for fine-grained detection. Unlike most previous work on machine-generated text detection, which focused on binary classification, LLM-DetectAIve supports four categories: (i) human-written, (ii) machine-generated, (iii) machine-written, then machine-humanized, and (iv) human-written, then machine-polished. Category (iii) aims to detect attempts to obfuscate the fact that a text was machine-generated, while category (iv) looks for cases where the LLM was used to polish a human-written text, which is typically acceptable in academic writing, but not in education. Our experiments show that LLM-DetectAIve can effectively identify the above four categories, which makes it a potentially useful tool in education, academia, and other domains.LLM-DetectAIve is publicly accessible at https://github.com/mbzuai-nlp/LLM-DetectAIve. The video describing our system is available at https://youtu.be/E8eT_bE7k8c.

Anthology ID:: 2024.emnlp-demo.35
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Delia Irazu Hernandez Farias, Tom Hope, Manling Li
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 336–343
Language:
URL:: https://aclanthology.org/2024.emnlp-demo.35
DOI:
Bibkey:
Cite (ACL):: Mervat Abassy, Kareem Elozeiri, Alexander Aziz, Minh Ngoc Ta, Raj Vardhan Tomar, Bimarsha Adhikari, Saad El Dine Ahmed, Yuxia Wang, Osama Mohammed Afzal, Zhuohan Xie, Jonibek Mansurov, Ekaterina Artemova, Vladislav Mikhailov, Rui Xing, Jiahui Geng, Hasan Iqbal, Zain Muhammad Mujahid, Tarek Mahmoud, Akim Tsvigun, et al.. 2024. LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 336–343, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection (Abassy et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-demo.35.pdf

PDF Cite Search