Codewithzichao@DravidianLangTech-EACL2021: Exploring Multilingual Transformers for Offensive Language Identification on Code Mixing Text

Zichao Li


Abstract
This paper describes our solution submitted to shared task on Offensive Language Identification in Dravidian Languages. We participated in all three of offensive language identification. In order to address the task, we explored multilingual models based on XLM-RoBERTa and multilingual BERT trained on mixed data of three code-mixed languages. Besides, we solved the class-imbalance problem existed in training data by class combination, class weights and focal loss. Our model achieved weighted average F1 scores of 0.75 (ranked 4th), 0.94 (ranked 4th) and 0.72 (ranked 3rd) in Tamil-English task, Malayalam-English task and Kannada-English task, respectively.
Anthology ID:
2021.dravidianlangtech-1.21
Volume:
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Month:
April
Year:
2021
Address:
Kyiv
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Parameswari Krishnamurthy, Elizabeth Sherly
Venue:
DravidianLangTech
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
164–168
Language:
URL:
https://aclanthology.org/2021.dravidianlangtech-1.21
DOI:
Bibkey:
Cite (ACL):
Zichao Li. 2021. Codewithzichao@DravidianLangTech-EACL2021: Exploring Multilingual Transformers for Offensive Language Identification on Code Mixing Text. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 164–168, Kyiv. Association for Computational Linguistics.
Cite (Informal):
Codewithzichao@DravidianLangTech-EACL2021: Exploring Multilingual Transformers for Offensive Language Identification on Code Mixing Text (Li, DravidianLangTech 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.dravidianlangtech-1.21.pdf
Software:
 2021.dravidianlangtech-1.21.Software.zip