GFST: Gender-Filtered Self-Training for More Accurate Gender in Translation

Prafulla Kumar Choubey; Anna Currey; Prashant Mathur; Georgiana Dinu

doi:10.18653/v1/2021.emnlp-main.123

GFST: Gender-Filtered Self-Training for More Accurate Gender in Translation

Prafulla Kumar Choubey, Anna Currey, Prashant Mathur, Georgiana Dinu

Abstract

Targeted evaluations have found that machine translation systems often output incorrect gender in translations, even when the gender is clear from context. Furthermore, these incorrectly gendered translations have the potential to reflect or amplify social biases. We propose gender-filtered self-training (GFST) to improve gender translation accuracy on unambiguously gendered inputs. Our GFST approach uses a source monolingual corpus and an initial model to generate gender-specific pseudo-parallel corpora which are then filtered and added to the training data. We evaluate GFST on translation from English into five languages, finding that it improves gender accuracy without damaging generic quality. We also show the viability of GFST on several experimental settings, including re-training from scratch, fine-tuning, controlling the gender balance of the data, forward translation, and back-translation.

Anthology ID:: 2021.emnlp-main.123
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1640–1654
Language:
URL:: https://aclanthology.org/2021.emnlp-main.123/
DOI:: 10.18653/v1/2021.emnlp-main.123
Bibkey:
Cite (ACL):: Prafulla Kumar Choubey, Anna Currey, Prashant Mathur, and Georgiana Dinu. 2021. GFST: Gender-Filtered Self-Training for More Accurate Gender in Translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1640–1654, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: GFST: Gender-Filtered Self-Training for More Accurate Gender in Translation (Choubey et al., EMNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.emnlp-main.123.pdf
Video:: https://aclanthology.org/2021.emnlp-main.123.mp4
Code: amazon-research/gfst-nmt

PDF Cite Search Code Video Fix data