Glitter: A Multi-Sentence, Multi-Reference Benchmark for Gender-Fair German Machine Translation

A Pranav, Janiça Hackenbuchner, Giuseppe Attanasio, Manuel Lardelli, Anne Lauscher


Abstract
Machine translation (MT) research addressing gender inclusivity has gained attention for promoting non-exclusionary language representing all genders. However, existing resources are limited in size, most often consisting of single sentences, or single gender-fair formulation types, leaving questions about MT models’ ability to use context and diverse inclusive forms. We introduce Glitter, an English-German benchmark featuring extended passages with professional translations implementing three gender-fair alternatives: neutral rewording, typographical solutions (gender star), and neologistic forms (-ens forms). Our experiments reveal significant limitations in state-of-the-art language models, which default to masculine generics, struggle to interpret explicit gender cues in context, and rarely produce gender-fair translations. Through a systematic prompting analysis designed to elicit fair language, we demonstrate that these limitations stem from models’ fundamental misunderstanding of gender phenomena, as they fail to implement inclusive forms even when explicitly instructed. Glitter establishes a challenging benchmark, advancing research in gender-fair English-German MT. It highlights substantial room for improvement among leading models and can guide the development of future MT models capable of accurately representing gender diversity.
Anthology ID:
2025.findings-emnlp.1002
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18450–18477
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1002/
DOI:
Bibkey:
Cite (ACL):
A Pranav, Janiça Hackenbuchner, Giuseppe Attanasio, Manuel Lardelli, and Anne Lauscher. 2025. Glitter: A Multi-Sentence, Multi-Reference Benchmark for Gender-Fair German Machine Translation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 18450–18477, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Glitter: A Multi-Sentence, Multi-Reference Benchmark for Gender-Fair German Machine Translation (Pranav et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1002.pdf
Checklist:
 2025.findings-emnlp.1002.checklist.pdf