GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages

Spencer Rarrick, Ranjita Naik, Sundar Poudel, Vishal Chowdhary


Abstract
Neural Machine Translation (NMT) continues to improve in quality and adoption, yet the in advertent perpetuation of gender bias remains a significant concern. Despite numerous studies on gender bias in translations into English from weakly gendered-languages, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies. To address this gap, we introduce GATE X-E, an extension to the GATE (Rarrick et al., 2023) corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. Each translation is accompanied by feminine, masculine, and neutral variants. The dataset, which contains between 1250 and 1850 instances for each of the four language pairs, features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena. Additionally, we present a translation gender rewriting solution built with GPT-4 and use GATE X-E to evaluate it. We open source our contributions to encourage further research on gender debiasing.
Anthology ID:
2024.findings-acl.504
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8526–8546
Language:
URL:
https://aclanthology.org/2024.findings-acl.504
DOI:
Bibkey:
Cite (ACL):
Spencer Rarrick, Ranjita Naik, Sundar Poudel, and Vishal Chowdhary. 2024. GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages. In Findings of the Association for Computational Linguistics ACL 2024, pages 8526–8546, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages (Rarrick et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.504.pdf