Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction

Manikandan Ravikiran, Bharathi Raja Chakravarthi


Abstract
This paper investigates the effectiveness of sentence-level transformers for zero-shot offensive span identification on a code-mixed Tamil dataset. More specifically, we evaluate rationale extraction methods of Local Interpretable Model Agnostic Explanations (LIME) (CITATION) and Integrated Gradients (IG) (CITATION) for adapting transformer based offensive language classification models for zero-shot offensive span identification. To this end, we find that LIME and IG show baseline F1 of 26.35% and 44.83%, respectively. Besides, we study the effect of data set size and training process on the overall accuracy of span identification. As a result, we find both LIME and IG to show significant improvement with Masked Data Augmentation and Multilabel Training, with F1 of 50.23% and 47.38% respectively. Disclaimer : This paper contains examples that may be considered profane, vulgar, or offensive. The examples do not represent the views of the authors or their employers/graduate schools towards any person(s), group(s), practice(s), or entity/entities. Instead they are used to emphasize only the linguistic research challenges.
Anthology ID:
2022.dravidianlangtech-1.37
Volume:
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Parameswari Krishnamurthy, Elizabeth Sherly, Sinnathamby Mahesan
Venue:
DravidianLangTech
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
240–247
Language:
URL:
https://aclanthology.org/2022.dravidianlangtech-1.37
DOI:
10.18653/v1/2022.dravidianlangtech-1.37
Bibkey:
Cite (ACL):
Manikandan Ravikiran and Bharathi Raja Chakravarthi. 2022. Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pages 240–247, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction (Ravikiran & Chakravarthi, DravidianLangTech 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.dravidianlangtech-1.37.pdf
Video:
 https://aclanthology.org/2022.dravidianlangtech-1.37.mp4
Code
 manikandan-ravikiran/zero-shot-offensive-span