Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns

Kellie Webster, Marta Recasens, Vera Axelrod, Jason Baldridge


Abstract
Coreference resolution is an important task for natural language understanding, and the resolution of ambiguous pronouns a longstanding challenge. Nonetheless, existing corpora do not capture ambiguous pronouns in sufficient volume or diversity to accurately indicate the practical utility of models. Furthermore, we find gender bias in existing corpora and systems favoring masculine entities. To address this, we present and release GAP, a gender-balanced labeled corpus of 8,908 ambiguous pronoun–name pairs sampled to provide diverse coverage of challenges posed by real-world text. We explore a range of baselines that demonstrate the complexity of the challenge, the best achieving just 66.9% F1. We show that syntactic structure and continuous neural models provide promising, complementary cues for approaching the challenge.
Anthology ID:
Q18-1042
Volume:
Transactions of the Association for Computational Linguistics, Volume 6
Month:
Year:
2018
Address:
Cambridge, MA
Editors:
Lillian Lee, Mark Johnson, Kristina Toutanova, Brian Roark
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
605–617
Language:
URL:
https://aclanthology.org/Q18-1042
DOI:
10.1162/tacl_a_00240
Bibkey:
Cite (ACL):
Kellie Webster, Marta Recasens, Vera Axelrod, and Jason Baldridge. 2018. Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns. Transactions of the Association for Computational Linguistics, 6:605–617.
Cite (Informal):
Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns (Webster et al., TACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/Q18-1042.pdf
Code
 additional community code
Data
GAP Coreference DatasetDefinite Pronoun Resolution DatasetWSCWikiCorefWinoBias