indicnlp@kgp at DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Languages

Kushal Kedia, Abhilash Nandy


Abstract
The paper aims to classify different offensive content types in 3 code-mixed Dravidian language datasets. The work leverages existing state of the art approaches in text classification by incorporating additional data and transfer learning on pre-trained models. Our final submission is an ensemble of an AWD-LSTM based model along with 2 different transformer model architectures based on BERT and RoBERTa. We achieved weighted-average F1 scores of 0.97, 0.77, and 0.72 in the Malayalam-English, Tamil-English, and Kannada-English datasets ranking 1st, 2nd, and 3rd on the respective shared-task leaderboards.
Anthology ID:
2021.dravidianlangtech-1.48
Volume:
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Month:
April
Year:
2021
Address:
Kyiv
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Parameswari Krishnamurthy, Elizabeth Sherly
Venue:
DravidianLangTech
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
330–335
Language:
URL:
https://aclanthology.org/2021.dravidianlangtech-1.48
DOI:
Bibkey:
Cite (ACL):
Kushal Kedia and Abhilash Nandy. 2021. indicnlp@kgp at DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Languages. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 330–335, Kyiv. Association for Computational Linguistics.
Cite (Informal):
indicnlp@kgp at DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Languages (Kedia & Nandy, DravidianLangTech 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.dravidianlangtech-1.48.pdf
Software:
 2021.dravidianlangtech-1.48.Software.zip
Code
 kushal2000/Dravidian-Offensive-Language-Identification
Data
OLID