kubapok@LT-EDI 2024: Evaluating Transformer Models for Hate Speech Detection in Tamil

Jakub Pokrywka, Krzysztof Jassem


Abstract
We describe the second-place submission for the shared task organized at the Fourth Workshop on Language Technology for Equality, Diversity, and Inclusion (LT-EDI-2024). The task focuses on detecting caste/migration hate speech in Tamil. The included texts involve the Tamil language in both Tamil script and transliterated into Latin script, with some texts also in English. Considering different scripts, we examined the performance of 12 transformer language models on the dev set. Our analysis revealed that for the whole dataset, the model google/muril-large-cased performs the best. We used an ensemble of several models for the final challenge submission, achieving 0.81 for the test dataset.
Anthology ID:
2024.ltedi-1.22
Volume:
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:
March
Year:
2024
Address:
St. Julian's, Malta
Editors:
Bharathi Raja Chakravarthi, Bharathi B, Paul Buitelaar, Thenmozhi Durairaj, György Kovács, Miguel Ángel García Cumbreras
Venues:
LTEDI | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
196–199
Language:
URL:
https://aclanthology.org/2024.ltedi-1.22
DOI:
Bibkey:
Cite (ACL):
Jakub Pokrywka and Krzysztof Jassem. 2024. kubapok@LT-EDI 2024: Evaluating Transformer Models for Hate Speech Detection in Tamil. In Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 196–199, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):
kubapok@LT-EDI 2024: Evaluating Transformer Models for Hate Speech Detection in Tamil (Pokrywka & Jassem, LTEDI-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ltedi-1.22.pdf