UMUTeam@TamilNLP-ACL2022: Abusive Detection in Tamil using Linguistic Features and Transformers

José Antonio García-Díaz; Manuel Valencia-Garcia; Rafael Valencia-García

doi:10.18653/v1/2022.dravidianlangtech-1.7

UMUTeam@TamilNLP-ACL2022: Abusive Detection in Tamil using Linguistic Features and Transformers

José García-Díaz, Manuel Valencia-Garcia, Rafael Valencia-García

Abstract

Social media has become a dangerous place as bullies take advantage of the anonymity the Internet provides to target and intimidate vulnerable individuals and groups. In the past few years, the research community has focused on developing automatic classification tools for detecting hate-speech, its variants, and other types of abusive behaviour. However, these methods are still at an early stage in low-resource languages. With the aim of reducing this barrier, the TamilNLP shared task has proposed a multi-classification challenge for Tamil written in Tamil script and code-mixed to detect abusive comments and hope-speech. Our participation consists of a knowledge integration strategy that combines sentence embeddings from BERT, RoBERTa, FastText and a subset of language-independent linguistic features. We achieved our best result in code-mixed, reaching 3rd position with a macro-average f1-score of 35%.

Anthology ID:: 2022.dravidianlangtech-1.7
Volume:: Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Parameswari Krishnamurthy, Elizabeth Sherly, Sinnathamby Mahesan
Venue:: DravidianLangTech
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 45–50
Language:
URL:: https://aclanthology.org/2022.dravidianlangtech-1.7/
DOI:: 10.18653/v1/2022.dravidianlangtech-1.7
Bibkey:
Cite (ACL):: José García-Díaz, Manuel Valencia-Garcia, and Rafael Valencia-García. 2022. UMUTeam@TamilNLP-ACL2022: Abusive Detection in Tamil using Linguistic Features and Transformers. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pages 45–50, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: UMUTeam@TamilNLP-ACL2022: Abusive Detection in Tamil using Linguistic Features and Transformers (García-Díaz et al., DravidianLangTech 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.dravidianlangtech-1.7.pdf
Video:: https://aclanthology.org/2022.dravidianlangtech-1.7.mp4

PDF Cite Search Video Fix data