OffTamil@DravideanLangTech-EASL2021: Offensive Language Identification in Tamil Text

Disne Sivalingam, Sajeetha Thavareesan


Abstract
In the last few decades, Code-Mixed Offensive texts are used penetratingly in social media posts. Social media platforms and online communities showed much interest on offensive text identification in recent years. Consequently, research community is also interested in identifying such content and also contributed to the development of corpora. Many publicly available corpora are there for research on identifying offensive text written in English language but rare for low resourced languages like Tamil. The first code-mixed offensive text for Dravidian languages are developed by shared task organizers which is used for this study. This study focused on offensive language identification on code-mixed low-resourced Dravidian language Tamil using four classifiers (Support Vector Machine, random forest, k- Nearest Neighbour and Naive Bayes) using chiˆ2 feature selection technique along with BoW and TF-IDF feature representation techniques using different combinations of n-grams. This proposed model achieved an accuracy of 76.96% while using linear SVM with TF-IDF feature representation technique.
Anthology ID:
2021.dravidianlangtech-1.51
Volume:
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Month:
April
Year:
2021
Address:
Kyiv
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Parameswari Krishnamurthy, Elizabeth Sherly
Venue:
DravidianLangTech
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
346–351
Language:
URL:
https://aclanthology.org/2021.dravidianlangtech-1.51
DOI:
Bibkey:
Cite (ACL):
Disne Sivalingam and Sajeetha Thavareesan. 2021. OffTamil@DravideanLangTech-EASL2021: Offensive Language Identification in Tamil Text. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 346–351, Kyiv. Association for Computational Linguistics.
Cite (Informal):
OffTamil@DravideanLangTech-EASL2021: Offensive Language Identification in Tamil Text (Sivalingam & Thavareesan, DravidianLangTech 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.dravidianlangtech-1.51.pdf