Cross-lingual Evaluation of Multilingual Text Generation

Shamil Chollampatt, Minh Quang Pham, Sathish Reddy Indurthi, Marco Turchi


Abstract
Scaling automatic evaluation of multilingual text generation of LLMs to new tasks, domains, and languages remains a challenge. Traditional evaluation on benchmark datasets carries the risk of reference data leakage in LLM training or involves additional human annotation effort. The alternative strategy of using another LLM as a scorer also faces uncertainty about the ability of this LLM itself to score non-English text. To address these issues, we propose an annotation-free cross-lingual evaluation protocol for multilingual text generation. Given an LLM candidate to be evaluated and a set of non-English inputs for a particular text generation task, our method first generates English references from the translation of the non-English inputs into English. This is done by an LLM that excels in the equivalent English text generation task. The non-English text generated by the LLM candidate is compared against the generated English references using a cross-lingual evaluation metric to assess the ability of the candidate LLM on multilingual text generation. Our protocol shows a high correlation to the reference-based ROUGE metric in four languages on news text summarization. We also evaluate a diverse set of LLMs in over 90 languages with different prompting strategies to study their multilingual generative abilities.
Anthology ID:
2025.coling-main.520
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7766–7777
Language:
URL:
https://aclanthology.org/2025.coling-main.520/
DOI:
Bibkey:
Cite (ACL):
Shamil Chollampatt, Minh Quang Pham, Sathish Reddy Indurthi, and Marco Turchi. 2025. Cross-lingual Evaluation of Multilingual Text Generation. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7766–7777, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Cross-lingual Evaluation of Multilingual Text Generation (Chollampatt et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.520.pdf