BuST: A Siamese Transformer Model for AI Text Detection in Bulgarian

Andrii Maslo, Silvia Gargova


Abstract
We introduce BuST (Bulgarian Siamese Transformer), a novel method for detecting machine-generated Bulgarian text using paraphrase-based semantic similarity. Inspired by the RAIDAR approach, BuST employs a Siamese Transformer architecture to compare input texts with their LLM-generated paraphrases, identifying subtle linguistic patterns that indicate synthetic origin. In pilot experiments, BuST achieved 88.79% accuracy and an F1-score of 88.0%, performing competitively with strong baselines. While BERT reached higher raw scores, BuST offers a model-agnostic and adaptable framework for low-resource settings, demonstrating the promise of paraphrase-driven detection strategies.
Anthology ID:
2025.ommm-1.5
Volume:
Proceedings of Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Piotr Przybyła, Matthew Shardlow, Clara Colombatto, Nanna Inie
Venues:
OMMM | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
45–52
Language:
URL:
https://aclanthology.org/2025.ommm-1.5/
DOI:
Bibkey:
Cite (ACL):
Andrii Maslo and Silvia Gargova. 2025. BuST: A Siamese Transformer Model for AI Text Detection in Bulgarian. In Proceedings of Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models, pages 45–52, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
BuST: A Siamese Transformer Model for AI Text Detection in Bulgarian (Maslo & Gargova, OMMM 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ommm-1.5.pdf