HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation

Shijie Zhang, Renhao Li, Songsheng Wang, Philipp Koehn, Min Yang, Derek F. Wong


Abstract
The advancement of Large Language Models (LLMs) enables flexible and interpretable automatic evaluations. In the field of machine translation evaluation, utilizing LLMs with translation error annotations based on Multidimensional Quality Metrics (MQM) yields more human-aligned judgments. However, current LLM-based evaluation methods still face challenges in accurately identifying error spans and assessing their severity. In this paper, we propose HiMATE, a Hierarchical Multi-Agent Framework for Machine Translation Evaluation. We argue that existing approaches inadequately exploit the fine-grained structural and semantic information within the MQM hierarchy. To address this, we develop a hierarchical multi-agent system grounded in the MQM error typology, enabling granular evaluation of subtype errors. Two key strategies are incorporated to further mitigate systemic hallucinations within the framework: the utilization of the model’s self-reflective capability and the facilitation of agent discussion involving asymmetric information. Empirically, HiMATE outperforms competitive baselines across different datasets in conducting human-aligned evaluations. Further analyses underscore its significant advantage in error span detection and severity assessment, achieving an average F1-score improvement of 89% over the best-performing baseline. We make our code and data publicly available at https://github.com/nlp2ct-shijie/HiMATE.
Anthology ID:
2025.findings-emnlp.593
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11121–11145
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.593/
DOI:
Bibkey:
Cite (ACL):
Shijie Zhang, Renhao Li, Songsheng Wang, Philipp Koehn, Min Yang, and Derek F. Wong. 2025. HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 11121–11145, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation (Zhang et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.593.pdf
Checklist:
 2025.findings-emnlp.593.checklist.pdf