GenderQuant: Quantifying Mention-Level Genderedness

Ananya; Nitya Parthasarthi; Sameer Singh

doi:10.18653/v1/N19-1303

GenderQuant: Quantifying Mention-Level Genderedness

Ananya, Nitya Parthasarthi, Sameer Singh

Abstract

Language is gendered if the context surrounding a mention is suggestive of a particular binary gender for that mention. Detecting the different ways in which language is gendered is an important task since gendered language can bias NLP models (such as for coreference resolution). This task is challenging since genderedness is often expressed in subtle ways. Existing approaches need considerable annotation efforts for each language, domain, and author, and often require handcrafted lexicons and features. Additionally, these approaches do not provide a quantifiable measure of how gendered the text is, nor are they applicable at the fine-grained mention level. In this paper, we use existing NLP pipelines to automatically annotate gender of mentions in the text. On corpora labeled using this method, we train a supervised classifier to predict the gender of any mention from its context and evaluate it on unseen text. The model confidence for a mention’s gender can be used as a proxy to indicate the level of genderedness of the context. We test this gendered language detector on movie summaries, movie reviews, news articles, and fiction novels, achieving an AUC-ROC of up to 0.71, and observe that the model predictions agree with human judgments collected for this task. We also provide examples of detected gendered sentences from aforementioned domains.

Anthology ID:: N19-1303
Volume:: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Editors:: Jill Burstein, Christy Doran, Thamar Solorio
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2959–2969
Language:
URL:: https://aclanthology.org/N19-1303/
DOI:: 10.18653/v1/N19-1303
Bibkey:
Cite (ACL):: Ananya, Nitya Parthasarthi, and Sameer Singh. 2019. GenderQuant: Quantifying Mention-Level Genderedness. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2959–2969, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: GenderQuant: Quantifying Mention-Level Genderedness (Ananya et al., NAACL 2019)
Copy Citation:
PDF:: https://aclanthology.org/N19-1303.pdf
Presentation:: N19-1303.Presentation.pdf

PDF Cite Search Presentation Fix data