Perplexity-Driven Contrastive Scoring for Unsupervised Detection of AI-Generated Texts in Polish

Damian Stachura


Abstract
The SMIGIEL competition at PolEval 2025 focuses on distinguishing Polish human-written text from AI-generated text. I participated in one of the subtasks that required a zero-shot detection method. My solution adapts the Binoculars detector by pairing language models and using calibrated thresholds. Specifically, I replaced the English language models from the original Binoculars method with models trained on Polish corpora. This approach achieved first place in the chosen competition track. Overall, my findings demonstrate that domain-specific language models and careful thresholding enable state-of-the-art zero-shot AI-text detection performance across new languages and domains. The code is publicly available at https://github.com/damian1996/2025-smigiel.
Anthology ID:
2025.poleval-main.4
Volume:
Proceedings of the PolEval 2025 Workshop
Month:
November
Year:
2025
Address:
Warsaw
Editors:
Łukasz Kobyliński, Alina Wróblewska, Maciej Ogrodniczuk
Venues:
PolEval | WS
SIG:
Publisher:
Institute of Computer Science PAS and Association for Computational Linguistics
Note:
Pages:
21–25
Language:
URL:
https://aclanthology.org/2025.poleval-main.4/
DOI:
Bibkey:
Cite (ACL):
Damian Stachura. 2025. Perplexity-Driven Contrastive Scoring for Unsupervised Detection of AI-Generated Texts in Polish. In Proceedings of the PolEval 2025 Workshop, pages 21–25, Warsaw. Institute of Computer Science PAS and Association for Computational Linguistics.
Cite (Informal):
Perplexity-Driven Contrastive Scoring for Unsupervised Detection of AI-Generated Texts in Polish (Stachura, PolEval 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.poleval-main.4.pdf