HeteroCorpus: A Corpus for Heteronormative Language Detection

Juan Vásquez, Gemma Bel-Enguix, Scott Thomas Andersen, Sergio-Luis Ojeda-Trueba


Abstract
In recent years, plenty of work has been done by the NLP community regarding gender bias detection and mitigation in language systems. Yet, to our knowledge, no one has focused on the difficult task of heteronormative language detection and mitigation. We consider this an urgent issue, since language technologies are growing increasingly present in the world and, as it has been proven by various studies, NLP systems with biases can create real-life adverse consequences for women, gender minorities and racial minorities and queer people. For these reasons, we propose and evaluate HeteroCorpus; a corpus created specifically for studying heterononormative language in English. Additionally, we propose a baseline set of classification experiments on our corpus, in order to show the performance of our corpus in classification tasks.
Anthology ID:
2022.gebnlp-1.23
Volume:
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:
July
Year:
2022
Address:
Seattle, Washington
Venue:
GeBNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
225–234
Language:
URL:
https://aclanthology.org/2022.gebnlp-1.23
DOI:
10.18653/v1/2022.gebnlp-1.23
Bibkey:
Cite (ACL):
Juan Vásquez, Gemma Bel-Enguix, Scott Thomas Andersen, and Sergio-Luis Ojeda-Trueba. 2022. HeteroCorpus: A Corpus for Heteronormative Language Detection. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 225–234, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
HeteroCorpus: A Corpus for Heteronormative Language Detection (Vásquez et al., GeBNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.gebnlp-1.23.pdf
Dataset:
 2022.gebnlp-1.23.dataset.zip
Video:
 https://aclanthology.org/2022.gebnlp-1.23.mp4
Code
 juanmvsa/heterocorpus