cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in Under-resourced Languages

Sidney Gig-Jan Wong; Matthew Durward

doi:10.18653/v1/2024.ltedi-1.19

cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in Under-resourced Languages

Abstract

This paper describes our homophobia/transphobia in social media comments detection system developed as part of the shared task at LT-EDI-2024. We took a transformer-based approach to develop our multiclass classification model for ten language conditions (English, Spanish, Gujarati, Hindi, Kannada, Malayalam, Marathi, Tamil, Tulu, and Telugu). We introduced synthetic and organic instances of script-switched language data during domain adaptation to mirror the linguistic realities of social media language as seen in the labelled training data. Our system ranked second for Gujarati and Telugu with varying levels of performance for other language conditions. The results suggest incorporating elements of paralinguistic behaviour such as script-switching may improve the performance of language detection systems especially in the cases of under-resourced languages conditions.

Anthology ID:: 2024.ltedi-1.19
Volume:: Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:: March
Year:: 2024
Address:: St. Julian's, Malta
Editors:: Bharathi Raja Chakravarthi, Bharathi B, Paul Buitelaar, Thenmozhi Durairaj, György Kovács, Miguel Ángel García Cumbreras
Venues:: LTEDI | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 177–183
Language:
URL:: https://aclanthology.org/2024.ltedi-1.19/
DOI:: 10.18653/v1/2024.ltedi-1.19
Bibkey:
Cite (ACL):: Sidney Wong and Matthew Durward. 2024. cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in Under-resourced Languages. In Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 177–183, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):: cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in Under-resourced Languages (Wong & Durward, LTEDI 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.ltedi-1.19.pdf
Video:: https://aclanthology.org/2024.ltedi-1.19.mp4

PDF Cite Search Video Fix data