Spartans@LT-EDI-EACL2021: Inclusive Speech Detection using Pretrained Language Models

Megha Sharma, Gaurav Arora


Abstract
We describe our system that ranked first in Hope Speech Detection (HSD) shared task and fourth in Offensive Language Identification (OLI) shared task, both in Tamil language. The goal of HSD and OLI is to identify if a code-mixed comment or post contains hope speech or offensive content respectively. We pre-train a transformer-based model RoBERTa using synthetically generated code-mixed data and use it in an ensemble along with their pre-trained ULMFiT model available from iNLTK.
Anthology ID:
2021.ltedi-1.28
Volume:
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion
Month:
April
Year:
2021
Address:
Kyiv
Venues:
EACL | LTEDI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
188–192
Language:
URL:
https://aclanthology.org/2021.ltedi-1.28
DOI:
Bibkey:
Cite (ACL):
Megha Sharma and Gaurav Arora. 2021. Spartans@LT-EDI-EACL2021: Inclusive Speech Detection using Pretrained Language Models. In Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion, pages 188–192, Kyiv. Association for Computational Linguistics.
Cite (Informal):
Spartans@LT-EDI-EACL2021: Inclusive Speech Detection using Pretrained Language Models (Sharma & Arora, LTEDI 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ltedi-1.28.pdf
Software:
 2021.ltedi-1.28.Software.zip