MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation

Jiahuan Li, Shanbo Cheng, Shujian Huang, Jiajun Chen


Abstract
Large Language Models (LLM) have demonstrated their strong ability in the field of machine translation, yet they suffer from high computational cost and latency. Therefore, transferring translation knowledge from giant LLMs to medium-sized machine translation models is a promising research direction. However, traditional knowledge distillation methods ignore the capability of student and teacher models, therefore repeatedly teaching student models on the knowledge they have learned, and failing to extend to novel contexts and knowledge. In this paper, we propose a framework called MT-Patcher, which transfers knowledge from LLMs to existing MT models in a selective, comprehensive and proactive manner. Considering the current translation ability of student MT models, we only identify and correct their translation errors, instead of distilling the whole translation from the teacher. Leveraging the strong language abilities of LLMs, we instruct LLM teachers to synthesize diverse contexts and anticipate more potential errors for the student. Experiment results on translating both specific language phenomena and general MT benchmarks demonstrate that finetuning the MT model on about 10% examples can achieve comparable results to the traditional knowledge distillation method, and synthesized potential errors and diverse contexts further improve MT performances on unseen contexts and words.
Anthology ID:
2024.naacl-long.358
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6445–6459
Language:
URL:
https://aclanthology.org/2024.naacl-long.358
DOI:
Bibkey:
Cite (ACL):
Jiahuan Li, Shanbo Cheng, Shujian Huang, and Jiajun Chen. 2024. MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6445–6459, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation (Li et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.358.pdf
Copyright:
 2024.naacl-long.358.copyright.pdf