Fine-tuning Neural Machine Translation on Gender-Balanced Datasets

Marta R. Costa-jussà, Adrià de Jorge


Abstract
Misrepresentation of certain communities in datasets is causing big disruptions in artificial intelligence applications. In this paper, we propose using an automatically extracted gender-balanced dataset parallel corpus from Wikipedia. This balanced set is used to perform fine-tuning techniques from a bigger model trained on unbalanced datasets to mitigate gender biases in neural machine translation.
Anthology ID:
2020.gebnlp-1.3
Volume:
Proceedings of the Second Workshop on Gender Bias in Natural Language Processing
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Marta R. Costa-jussà, Christian Hardmeier, Will Radford, Kellie Webster
Venue:
GeBNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26–34
Language:
URL:
https://aclanthology.org/2020.gebnlp-1.3
DOI:
Bibkey:
Cite (ACL):
Marta R. Costa-jussà and Adrià de Jorge. 2020. Fine-tuning Neural Machine Translation on Gender-Balanced Datasets. In Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, pages 26–34, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
Fine-tuning Neural Machine Translation on Gender-Balanced Datasets (Costa-jussà & de Jorge, GeBNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.gebnlp-1.3.pdf