On the Errors in Code-Mixed Tamil-English Offensive Span Identification

Manikandan Ravikiran, Bharathi Raja Chakravarthi


Abstract
In recent times, offensive span identification in code-mixed Tamil-English language has seen traction with the release of datasets, shared tasks, and the development of multiple methods. However, the details of various errors shown by these methods are currently unclear. This paper presents a detailed analysis of various errors in state-of-the-art Tamil-English offensive span identification methods. Our study reveals the strengths and weaknesses of the widely used sequence labeling and zero-shot models for offensive span identification. In the due process, we identify data-related errors, improve data annotation and release additional diagnostic data to evaluate models’ quality and stability. Disclaimer: This paper contains examples that may be considered profane, vulgar, or offensive. The examples do not represent the views of the authors or their employers/graduate schools towards any person(s), group(s), practice(s), or entity/entities. Instead, they emphasize the complexity of various errors and linguistic research challenges.
Anthology ID:
2023.dravidianlangtech-1.1
Volume:
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Bharathi R. Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Sajeetha Thavareesan, Elizabeth Sherly
Venues:
DravidianLangTech | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
1–9
Language:
URL:
https://aclanthology.org/2023.dravidianlangtech-1.1
DOI:
Bibkey:
Cite (ACL):
Manikandan Ravikiran and Bharathi Raja Chakravarthi. 2023. On the Errors in Code-Mixed Tamil-English Offensive Span Identification. In Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages, pages 1–9, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
On the Errors in Code-Mixed Tamil-English Offensive Span Identification (Ravikiran & Chakravarthi, DravidianLangTech-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.dravidianlangtech-1.1.pdf