Hypers@DravidianLangTech-EACL2021: Offensive language identification in Dravidian code-mixed YouTube Comments and Posts

Charangan Vasantharajan, Uthayasanker Thayasivam


Abstract
Code-Mixed Offensive contents are used pervasively in social media posts in the last few years. Consequently, gained the significant attraction of the research community for identifying the different forms of such content (e.g., hate speech, and sentiments) and contributed to the creation of datasets. Most of the recent studies deal with high-resource languages (e.g., English) due to many publicly available datasets, and by the lack of dataset in low-resource anguages, those studies are slightly involved in these languages. Therefore, this study has the focus on offensive language identification on code-mixed low-resourced Dravidian languages such as Tamil, Kannada, and Malayalam using the bidirectional approach and fine-tuning strategies. According to the leaderboard, the proposed model got a 0.96 F1-score for Malayalam, 0.73 F1-score for Tamil, and 0.70 F1-score for Kannada in the bench-mark. Moreover, in the view of multilingual models, this modal ranked 3rd and achieved favorable results and confirmed the model as the best among all systems submitted to these shared tasks in these three languages.
Anthology ID:
2021.dravidianlangtech-1.26
Volume:
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Month:
April
Year:
2021
Address:
Kyiv
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Parameswari Krishnamurthy, Elizabeth Sherly
Venue:
DravidianLangTech
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
195–202
Language:
URL:
https://aclanthology.org/2021.dravidianlangtech-1.26
DOI:
Bibkey:
Cite (ACL):
Charangan Vasantharajan and Uthayasanker Thayasivam. 2021. Hypers@DravidianLangTech-EACL2021: Offensive language identification in Dravidian code-mixed YouTube Comments and Posts. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 195–202, Kyiv. Association for Computational Linguistics.
Cite (Informal):
Hypers@DravidianLangTech-EACL2021: Offensive language identification in Dravidian code-mixed YouTube Comments and Posts (Vasantharajan & Thayasivam, DravidianLangTech 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.dravidianlangtech-1.26.pdf
Software:
 2021.dravidianlangtech-1.26.Software.zip