HOMO-MEX: A Mexican Spanish Annotated Corpus for LGBT+phobia Detection on Twitter

Juan Vásquez, Scott Andersen, Gemma Bel-enguix, Helena Gómez-adorno, Sergio-luis Ojeda-trueba


Abstract
In the past few years, the NLP community has actively worked on detecting LGBT+Phobia in online spaces, using textual data publicly available Most of these are for the English language and its variants since it is the most studied language by the NLP community. Nevertheless, efforts towards creating corpora in other languages are active worldwide. Despite this, the Spanish language is an understudied language regarding digital LGBT+Phobia. The only corpus we found in the literature was for the Peninsular Spanish dialects, which use LGBT+phobic terms different than those in the Mexican dialect. For this reason, we present Homo-MEX, a novel corpus for detecting LGBT+Phobia in Mexican Spanish. In this paper, we describe our data-gathering and annotation process. Also, we present a classification benchmark using various traditional machine learning algorithms and two pre-trained deep learning models to showcase our corpus classification potential.
Anthology ID:
2023.woah-1.20
Volume:
The 7th Workshop on Online Abuse and Harms (WOAH)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Yi-ling Chung, Paul R{\"ottger}, Debora Nozza, Zeerak Talat, Aida Mostafazadeh Davani
Venue:
WOAH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
202–214
Language:
URL:
https://aclanthology.org/2023.woah-1.20
DOI:
10.18653/v1/2023.woah-1.20
Bibkey:
Cite (ACL):
Juan Vásquez, Scott Andersen, Gemma Bel-enguix, Helena Gómez-adorno, and Sergio-luis Ojeda-trueba. 2023. HOMO-MEX: A Mexican Spanish Annotated Corpus for LGBT+phobia Detection on Twitter. In The 7th Workshop on Online Abuse and Harms (WOAH), pages 202–214, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
HOMO-MEX: A Mexican Spanish Annotated Corpus for LGBT+phobia Detection on Twitter (Vásquez et al., WOAH 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.woah-1.20.pdf