PANDAS@Abusive Comment Detection in Tamil Code-Mixed Data Using Custom Embeddings with LaBSE

Gayathri G L, Krithika Swaminathan, Divyasri K, Thenmozhi Durairaj, Bharathi B


Abstract
Abusive language has lately been prevalent in comments on various social media platforms. The increasing hostility observed on the internet calls for the creation of a system that can identify and flag such acerbic content, to prevent conflict and mental distress. This task becomes more challenging when low-resource languages like Tamil, as well as the often-observed Tamil-English code-mixed text, are involved. The approach used in this paper for the classification model includes different methods of feature extraction and the use of traditional classifiers. We propose a novel method of combining language-agnostic sentence embeddings with the TF-IDF vector representation that uses a curated corpus of words as vocabulary, to create a custom embedding, which is then passed to an SVM classifier. Our experimentation yielded an accuracy of 52% and an F1-score of 0.54.
Anthology ID:
2022.dravidianlangtech-1.18
Volume:
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Parameswari Krishnamurthy, Elizabeth Sherly, Sinnathamby Mahesan
Venue:
DravidianLangTech
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
112–119
Language:
URL:
https://aclanthology.org/2022.dravidianlangtech-1.18
DOI:
10.18653/v1/2022.dravidianlangtech-1.18
Bibkey:
Cite (ACL):
Gayathri G L, Krithika Swaminathan, Divyasri K, Thenmozhi Durairaj, and Bharathi B. 2022. PANDAS@Abusive Comment Detection in Tamil Code-Mixed Data Using Custom Embeddings with LaBSE. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pages 112–119, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
PANDAS@Abusive Comment Detection in Tamil Code-Mixed Data Using Custom Embeddings with LaBSE (G L et al., DravidianLangTech 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.dravidianlangtech-1.18.pdf
Video:
 https://aclanthology.org/2022.dravidianlangtech-1.18.mp4