MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation

Mehul Agarwal; Aditya Aggarwal; Arnav Goel; Medha Hira; Anubha Gupta

MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation

Mehul Agarwal, Aditya Aggarwal, Arnav Goel, Medha Hira, Anubha Gupta

Abstract

While multilingual large language models (LLMs) perform well on high-level tasks like translation and question answering, their ability to handle grammatical gender and morphological agreement remains underexplored. In morphologically rich languages, gender influences verb conjugation, pronouns, and even first-person constructions with explicit and implicit mentions of gender. We introduce MORPHOGEN, a morphologically grounded large-scale benchmark dataset for evaluating gender-aware generation in three typologically diverse grammatically gendered languages: French, Arabic, and Hindi. The core task, GENFORM, requires models to rewrite a first-person sentence in the opposite gender while preserving its meaning and structure. We construct a high-quality synthetic dataset spanning these three languages and benchmark 15 popular multilingual LLMs (2B–70B) on their ability to perform this transformation. Our results reveal significant gaps and interesting insights into how current models handle morphological gender. MORPHOGEN provides a focused diagnostic lens for gender-aware language modeling and lays the groundwork for future research on inclusive and morphology-sensitive NLP.

Anthology ID:: 2026.acl-long.105
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2289–2313
Language:
URL:: https://aclanthology.org/2026.acl-long.105/
DOI:
Bibkey:
Cite (ACL):: Mehul Agarwal, Aditya Aggarwal, Arnav Goel, Medha Hira, and Anubha Gupta. 2026. MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2289–2313, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation (Agarwal et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.105.pdf
Checklist:: 2026.acl-long.105.checklist.pdf

PDF Cite Search Checklist Fix data