Sandalphon@DravidianLangTech-EACL2024: Hate and Offensive Language Detection in Telugu Code-mixed Text using Transliteration-Augmentation

Nafisa Tabassum; Mosabbir Khan; Shawly Ahsan; Jawad Hossain; Mohammed Moshiul Hoque

doi:10.18653/v1/2024.dravidianlangtech-1.28

Sandalphon@DravidianLangTech-EACL2024: Hate and Offensive Language Detection in Telugu Code-mixed Text using Transliteration-Augmentation

Nafisa Tabassum, Mosabbir Khan, Shawly Ahsan, Jawad Hossain, Mohammed Moshiul Hoque

Abstract

Hate and offensive language in online platforms pose significant challenges, necessitating automatic detection methods. Particularly in the case of codemixed text, which is very common in social media, the complexity of this problem increases due to the cultural nuances of different languages. DravidianLangTech-EACL2024 organized a shared task on detecting hate and offensive language for Telugu. To complete this task, this study investigates the effectiveness of transliteration-augmented datasets for Telugu code-mixed text. In this work, we compare the performance of various machine learning (ML), deep learning (DL), and transformer-based models on both original and augmented datasets. Experimental findings demonstrate the superiority of transformer models, particularly Telugu-BERT, achieving the highest f₁-score of 0.77 on the augmented dataset, ranking the 1^st position in the leaderboard. The study highlights the potential of transliteration-augmented datasets in improving model performance and suggests further exploration of diverse transliteration options to address real-world scenarios.

Anthology ID:: 2024.dravidianlangtech-1.28
Volume:: Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:: March
Year:: 2024
Address:: St. Julian's, Malta
Editors:: Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Elizabeth Sherly, Rajeswari Nadarajan, Manikandan Ravikiran
Venues:: DravidianLangTech | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 167–172
Language:
URL:: https://aclanthology.org/2024.dravidianlangtech-1.28/
DOI:: 10.18653/v1/2024.dravidianlangtech-1.28
Bibkey:
Cite (ACL):: Nafisa Tabassum, Mosabbir Khan, Shawly Ahsan, Jawad Hossain, and Mohammed Moshiul Hoque. 2024. Sandalphon@DravidianLangTech-EACL2024: Hate and Offensive Language Detection in Telugu Code-mixed Text using Transliteration-Augmentation. In Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 167–172, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):: Sandalphon@DravidianLangTech-EACL2024: Hate and Offensive Language Detection in Telugu Code-mixed Text using Transliteration-Augmentation (Tabassum et al., DravidianLangTech 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.dravidianlangtech-1.28.pdf
Video:: https://aclanthology.org/2024.dravidianlangtech-1.28.mp4

PDF Cite Search Video Fix data