Detecting Machine-Generated Text in Polish Using Fine-Tuned Qwen Models

Konrad Pierzyński

Detecting Machine-Generated Text in Polish Using Fine-Tuned Qwen Models

Abstract

This paper introduces the first shared task on machine-generated text (MGT) detection for Polish, organised as part of the PolEval 2025 evaluation campaign. The task evaluates participating systems under three scenarios – unsupervised, constrained, and open – designed to reflect different levels of access to training data. In total, seven systems were submitted. The results indicate that MGT detection for Polish is feasible, with the best-performing constrained systems achieving over 90% accuracy on the main evaluation set. However, performance drops when models are tested on unseen domains or generator models, revealing substantial limitations in generalisation. In the most challenging settings, unsupervised approaches beat the supervised ones. This shared task establishes a new benchmark for MGT detection in Polish. The publicly released Śmigiel dataset is intended to support future research on robust and generalisable MGT detection.

Anthology ID:: 2025.poleval-main.3
Volume:: Proceedings of the PolEval 2025 Workshop
Month:: November
Year:: 2025
Address:: Warsaw
Editors:: Łukasz Kobyliński, Alina Wróblewska, Maciej Ogrodniczuk
Venues:: PolEval | WS
SIG:
Publisher:: Institute of Computer Science PAS and Association for Computational Linguistics
Note:
Pages:: 16–20
Language:
URL:: https://aclanthology.org/2025.poleval-main.3/
DOI:
Bibkey:
Cite (ACL):: Konrad Pierzyński. 2025. Detecting Machine-Generated Text in Polish Using Fine-Tuned Qwen Models. In Proceedings of the PolEval 2025 Workshop, pages 16–20, Warsaw. Institute of Computer Science PAS and Association for Computational Linguistics.
Cite (Informal):: Detecting Machine-Generated Text in Polish Using Fine-Tuned Qwen Models (Pierzyński, PolEval 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.poleval-main.3.pdf

PDF Cite Search Fix data