MIT-KEC-NLP@DravidianLangTech-EACL 2024: Offensive Content Detection in Kannada and Kannada-English Mixed Text Using Deep Learning Techniques

Kogilavani Shanmugavadivel, Sowbarnigaa K S, Mehal Sakthi M S, Subhadevi K, Malliga Subramanian


Abstract
This study presents a strong methodology for detecting offensive content in multilingual text, with a focus on Kannada and Kannada-English mixed comments. The first step in data preprocessing is to work with a dataset containing Kannada comments, which is backed by Google Translate for Kannada-English translation. Following tokenization and sequence labeling, BIO tags are assigned to indicate the existence and bounds of objectionable spans within the text. On annotated data, a Bidirectional LSTM neural network model is trained and BiLSTM model’s macro F1 score is 61.0 in recognizing objectionable content. Data preparation, model architecture definition, and iterative training with Kannada and Kannada- English text are all part of the training process. In a fresh dataset, the trained model accurately predicts offensive spans, emphasizing comments in the aforementioned languages. Predictions that have been recorded and include offensive span indices are organized into a database.
Anthology ID:
2024.dravidianlangtech-1.24
Volume:
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:
March
Year:
2024
Address:
St. Julian's, Malta
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Elizabeth Sherly, Rajeswari Nadarajan, Manikandan Ravikiran
Venues:
DravidianLangTech | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
146–150
Language:
URL:
https://aclanthology.org/2024.dravidianlangtech-1.24
DOI:
Bibkey:
Cite (ACL):
Kogilavani Shanmugavadivel, Sowbarnigaa K S, Mehal Sakthi M S, Subhadevi K, and Malliga Subramanian. 2024. MIT-KEC-NLP@DravidianLangTech-EACL 2024: Offensive Content Detection in Kannada and Kannada-English Mixed Text Using Deep Learning Techniques. In Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 146–150, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):
MIT-KEC-NLP@DravidianLangTech-EACL 2024: Offensive Content Detection in Kannada and Kannada-English Mixed Text Using Deep Learning Techniques (Shanmugavadivel et al., DravidianLangTech-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.dravidianlangtech-1.24.pdf