Enhancing Online Grooming Detection via Backtranslation Augmentation

Hamed Waezi, Hossein Fani


Abstract
Grooming minors for sexual exploitation become an increasingly significant concern in online conversation platforms. For a safer online experience for minors, machine learning models have been proposed to tap into explicit textual remarks and automate detecting predatory conversations. Such models, however, fall short of real-world applications for the sparse distribution of predatory conversations. In this paper, we propose backtranslation augmentation to augment training datasets with more predatory conversations. Through our experiments on 8 languages from 4 language families using 3 neural translators, we demonstrate that backtranslation augmentation improves models’ performance with fewer training epochs for better classification efficacy. Our code and experimental results are available at https://github.com/fani-lab/osprey/tree/coling25.
Anthology ID:
2025.coling-main.160
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2340–2350
Language:
URL:
https://aclanthology.org/2025.coling-main.160/
DOI:
Bibkey:
Cite (ACL):
Hamed Waezi and Hossein Fani. 2025. Enhancing Online Grooming Detection via Backtranslation Augmentation. In Proceedings of the 31st International Conference on Computational Linguistics, pages 2340–2350, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Enhancing Online Grooming Detection via Backtranslation Augmentation (Waezi & Fani, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.160.pdf