A Taxonomy of Bias-Causing Ambiguities in Machine Translation

Michal Měchura


Abstract
This paper introduces a taxonomy of phenomena which cause bias in machine translation, covering gender bias (people being male and/or female), number bias (singular you versus plural you) and formality bias (informal you versus formal you). Our taxonomy is a formalism for describing situations in machine translation when the source text leaves some of these properties unspecified (eg. does not say whether doctor is male or female) but the target language requires the property to be specified (eg. because it does not have a gender-neutral word for doctor). The formalism described here is used internally by a web-based tool we have built for detecting and correcting bias in the output of any machine translator.
Anthology ID:
2022.gebnlp-1.18
Volume:
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:
July
Year:
2022
Address:
Seattle, Washington
Venues:
GeBNLP | NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
168–173
Language:
URL:
https://aclanthology.org/2022.gebnlp-1.18
DOI:
10.18653/v1/2022.gebnlp-1.18
Bibkey:
Cite (ACL):
Michal Měchura. 2022. A Taxonomy of Bias-Causing Ambiguities in Machine Translation. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 168–173, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
A Taxonomy of Bias-Causing Ambiguities in Machine Translation (Měchura, GeBNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.gebnlp-1.18.pdf