CEN-Tamil@DravidianLangTech-ACL2022: Abusive Comment detection in Tamil using TF-IDF and Random Kitchen Sink Algorithm

Prasanth S N, R Aswin Raj, Adhithan P, Premjith B, Soman Kp


Abstract
This paper describes the approach of team CEN-Tamil used for abusive comment detection in Tamil. This task aims to identify whether a given comment contains abusive comments. We used TF-IDF with char-wb analyzers with Random Kitchen Sink (RKS) algorithm to create feature vectors and the Support Vector Machine (SVM) classifier with polynomial kernel for classification. We used this method for both Tamil and Tamil-English datasets and secured first place with an f1-score of 0.32 and seventh place with an f1-score of 0.25, respectively. The code for our approach is shared in the GitHub repository.
Anthology ID:
2022.dravidianlangtech-1.11
Volume:
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Parameswari Krishnamurthy, Elizabeth Sherly, Sinnathamby Mahesan
Venue:
DravidianLangTech
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
70–74
Language:
URL:
https://aclanthology.org/2022.dravidianlangtech-1.11
DOI:
10.18653/v1/2022.dravidianlangtech-1.11
Bibkey:
Cite (ACL):
Prasanth S N, R Aswin Raj, Adhithan P, Premjith B, and Soman Kp. 2022. CEN-Tamil@DravidianLangTech-ACL2022: Abusive Comment detection in Tamil using TF-IDF and Random Kitchen Sink Algorithm. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pages 70–74, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
CEN-Tamil@DravidianLangTech-ACL2022: Abusive Comment detection in Tamil using TF-IDF and Random Kitchen Sink Algorithm (S N et al., DravidianLangTech 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.dravidianlangtech-1.11.pdf
Video:
 https://aclanthology.org/2022.dravidianlangtech-1.11.mp4