Nota AI at GenAI Detection Task 1: Unseen Language-Aware Detection System for Multilingual Machine-Generated Text

Hancheol Park, Jaeyeon Kim, Geonmin Kim, Tae-Ho Kim


Abstract
Recently, large language models (LLMs) have demonstrated unprecedented capabilities in language generation, yet they still often produce incorrect information. Therefore, determining whether a text was generated by an LLM has become one of the factors that must be considered when evaluating its reliability. In this paper, we discuss methods to determine whether texts written in various languages were authored by humans or generated by LLMs. We have discovered that the classification accuracy significantly decreases for texts written in languages not observed during the training process, and we aim to address this issue. We propose a method to improve performance for unseen languages by using token-level predictive distributions extracted from various LLMs and text embeddings from a multilingual pre-trained langauge model. With the proposed method, we achieved third place out of 25 teams in Subtask B (binary multilingual machine-generated text detection) of Shared Task 1, with an F1 macro score of 0.7532.
Anthology ID:
2025.genaidetect-1.19
Volume:
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Firoj Alam, Preslav Nakov, Nizar Habash, Iryna Gurevych, Shammur Chowdhury, Artem Shelmanov, Yuxia Wang, Ekaterina Artemova, Mucahid Kutlu, George Mikros
Venues:
GenAIDetect | WS
SIG:
Publisher:
International Conference on Computational Linguistics
Note:
Pages:
191–196
Language:
URL:
https://aclanthology.org/2025.genaidetect-1.19/
DOI:
Bibkey:
Cite (ACL):
Hancheol Park, Jaeyeon Kim, Geonmin Kim, and Tae-Ho Kim. 2025. Nota AI at GenAI Detection Task 1: Unseen Language-Aware Detection System for Multilingual Machine-Generated Text. In Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect), pages 191–196, Abu Dhabi, UAE. International Conference on Computational Linguistics.
Cite (Informal):
Nota AI at GenAI Detection Task 1: Unseen Language-Aware Detection System for Multilingual Machine-Generated Text (Park et al., GenAIDetect 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.genaidetect-1.19.pdf