PrimeLine@DravidianLangTech 2026: Abusive Tamil Comment Detection Using MuRIL

Rithikaa V; S.Sumathi; Nithya Varshini C N R; Sanjay Krishnan K

PrimeLine@DravidianLangTech 2026: Abusive Tamil Comment Detection Using MuRIL

Rithikaa V, S.Sumathi, Nithya Varshini C N R, Sanjay Krishnan K

Abstract

Detecting abusive language in Tamil social media is a genuinely difficult problem. The language is morphologically rich, speakers routinely mix Tamil with English, and informal romanised Tamil is common enough to confuse models trained primarily on formal text. This work presents a system for binary classification of Tamil comments into Abusive and Non-Abusive categories, submitted to the DravidianLangTech@ACL 2026 shared task. MuRIL, a BERT-based encoder pre-trained on 17 Indian languages and their transliterated equivalents, is fine-tuned, and it is shown that this Indian-language-specific pre-training provides a meaningful advantage over generic multilingual baselines. The system achieves a macro-averaged F1 of 0.83 on the validation set, compared to 0.79 for XLM-RoBERTa and 0.77 for mBERT under identical training conditions, establishing a strong transformer-based baseline for abusive language detection in code-mixed Tamil.

Anthology ID:: 2026.dravidianlangtech-1.51
Volume:: Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:: July
Year:: 2026
Address:: Underline (Virtual)
Editors:: Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Saranya Rajiakodi, Subalalitha Navaneethakrishnan, Dhivya Chinnappa, Balasubramanian Palani, Malliga Subramanian, Kogilavani Shanmugavadivel, Ratnavel Rajalakshmi
Venues:: DravidianLangTech | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 331–335
Language:
URL:: https://aclanthology.org/2026.dravidianlangtech-1.51/
DOI:
Bibkey:
Cite (ACL):: Rithikaa V, S.Sumathi, Nithya Varshini C N R, and Sanjay Krishnan K. 2026. PrimeLine@DravidianLangTech 2026: Abusive Tamil Comment Detection Using MuRIL. In Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 331–335, Underline (Virtual). Association for Computational Linguistics.
Cite (Informal):: PrimeLine@DravidianLangTech 2026: Abusive Tamil Comment Detection Using MuRIL (V et al., DravidianLangTech 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.dravidianlangtech-1.51.pdf

PDF Cite Search Fix data