IRNLP_DAIICT@DravidianLangTech-EACL2021:Offensive Language identification in Dravidian Languages using TF-IDF Char N-grams and MuRIL

Bhargav Dave; Shripad Bhat; Prasenjit Majumder

IRNLP_DAIICT@DravidianLangTech-EACL2021:Offensive Language identification in Dravidian Languages using TF-IDF Char N-grams and MuRIL

Bhargav Dave, Shripad Bhat, Prasenjit Majumder

Abstract

This paper presents the participation of the IRNLPDAIICT team from Information Retrieval and Natural Language Processing lab at DA-IICT, India in DravidianLangTech-EACL2021 Offensive Language identification in Dravidian Languages. The aim of this shared task is to identify Offensive Language from a code-mixed data-set of YouTube comments. The task is to classify comments into Not Offensive (NO), Offensive Untargetede(OU), Offensive Targeted Individual (OTI), Offensive Targeted Group (OTG), Offensive Targeted Others (OTO), Other Language (OL) for three Dravidian languages: Kannada, Malayalam and Tamil. We use TF-IDF character n-grams and pretrained MuRIL embeddings for text representation and Logistic Regression and Linear SVM for classification. Our best approach achieved Ninth, Third and Eighth with weighted F1 score of 0.64, 0.95 and 0.71in Kannada, Malayalam and Tamil on test dataset respectively.

Anthology ID:: 2021.dravidianlangtech-1.37
Volume:: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Month:: April
Year:: 2021
Address:: Kyiv
Editors:: Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Parameswari Krishnamurthy, Elizabeth Sherly
Venue:: DravidianLangTech
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 266–269
Language:
URL:: https://aclanthology.org/2021.dravidianlangtech-1.37/
DOI:
Bibkey:
Cite (ACL):: Bhargav Dave, Shripad Bhat, and Prasenjit Majumder. 2021. IRNLP_DAIICT@DravidianLangTech-EACL2021:Offensive Language identification in Dravidian Languages using TF-IDF Char N-grams and MuRIL. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 266–269, Kyiv. Association for Computational Linguistics.
Cite (Informal):: IRNLP_DAIICT@DravidianLangTech-EACL2021:Offensive Language identification in Dravidian Languages using TF-IDF Char N-grams and MuRIL (Dave et al., DravidianLangTech 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.dravidianlangtech-1.37.pdf
Software:: 2021.dravidianlangtech-1.37.Software.zip

PDF Cite Search Software Fix data