Improving NMT Models by Retrofitting Quality Estimators into Trainable Energy Loss

Gahyun Yoo, Jay Yoon Lee


Abstract
Reinforcement learning has shown great promise in aligning language models with human preferences in a variety of text generation tasks, including machine translation. For translation tasks, rewards can easily be obtained from quality estimation (QE) models which can generate rewards for unlabeled data. Despite its usefulness, reinforcement learning cannot exploit the gradients with respect to the QE score. We propose QE-EBM, a method of employing quality estimators as trainable loss networks that can directly backpropagate to the NMT model. We examine our method on several low and high resource target languages with English as the source language. QE-EBM outperforms strong baselines such as REINFORCE and proximal policy optimization (PPO) as well as supervised fine-tuning for all target languages, especially low-resource target languages. Most notably, for English-to-Mongolian translation, our method achieves improvements of 2.5 BLEU, 7.1 COMET-KIWI, 5.3 COMET, and 6.4 XCOMET relative to the supervised baseline.
Anthology ID:
2025.coling-main.545
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8184–8196
Language:
URL:
https://aclanthology.org/2025.coling-main.545/
DOI:
Bibkey:
Cite (ACL):
Gahyun Yoo and Jay Yoon Lee. 2025. Improving NMT Models by Retrofitting Quality Estimators into Trainable Energy Loss. In Proceedings of the 31st International Conference on Computational Linguistics, pages 8184–8196, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Improving NMT Models by Retrofitting Quality Estimators into Trainable Energy Loss (Yoo & Lee, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.545.pdf