The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs

Aleix Sant, Carlos Escolano, Audrey Mash, Francesca De Luca Fornaciari, Maite Melero


Abstract
This paper studies gender bias in machine translation through the lens of Large Language Models (LLMs). Four widely-used test sets are employed to benchmark various base LLMs, comparing their translation quality and gender bias against state-of-the-art Neural Machine Translation (NMT) models for English to Catalan (En → Ca) and English to Spanish (En → Es) translation directions. Our findings reveal pervasive gender bias across all models, with base LLMs exhibiting a higher degree of bias compared to NMT models.To combat this bias, we explore prompting engineering techniques applied to an instruction-tuned LLM. We identify a prompt structure that significantly reduces gender bias by up to 12% on the WinoMT evaluation dataset compared to more straightforward prompts. These results significantly reduce the gender bias accuracy gap between LLMs and traditional NMT systems.
Anthology ID:
2024.gebnlp-1.7
Volume:
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Seraphina Goldfarb-Tarrant, Debora Nozza
Venues:
GeBNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
94–139
Language:
URL:
https://aclanthology.org/2024.gebnlp-1.7
DOI:
Bibkey:
Cite (ACL):
Aleix Sant, Carlos Escolano, Audrey Mash, Francesca De Luca Fornaciari, and Maite Melero. 2024. The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs. In Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 94–139, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs (Sant et al., GeBNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.gebnlp-1.7.pdf