From General LLM to Translation: How We Dramatically Improve Translation Quality Using Human Evaluation Data for LLM Finetuning

Denis Elshin; Nikolay Karpachev; Boris Gruzdev; Ilya Golovanov; Georgy Ivanov; Alexander Antonov; Nickolay Skachkov; Ekaterina Latypova; Vladimir Layner; Ekaterina Enikeeva; Dmitry Popov; Anton Chekashev; Vladislav Negodin; Vera Frantsuzova; Alexander Chernyshev; Kirill Denisov

From General LLM to Translation: How We Dramatically Improve Translation Quality Using Human Evaluation Data for LLM Finetuning

Denis Elshin, Nikolay Karpachev, Boris Gruzdev, Ilya Golovanov, Georgy Ivanov, Alexander Antonov, Nickolay Skachkov, Ekaterina Latypova, Vladimir Layner, Ekaterina Enikeeva, Dmitry Popov, Anton Chekashev, Vladislav Negodin, Vera Frantsuzova, Alexander Chernyshev, Kirill Denisov

Abstract

In this paper, we present the methodology employed by the NLP team at Yandex LLC for participating in the WMT 2024 General MT Translation track, focusing on English-to-Russian translation. Our approach involves training a YandexGPT LLM-based model for translation tasks using a multi-stage process to ensure high-quality and contextually accurate translations.Initially, we utilize a pre-trained model, trained on a large corpus of high-quality monolingual texts in various languages, crawled from various open sources, not limited to English and Russian. This extensive pre-training allows the model to capture a broad spectrum of linguistic nuances and structures. Following this, the model is fine-tuned on a substantial parallel corpus of high-quality texts collected from diverse open sources, including websites, books, and subtitles. These texts are meticulously aligned at both the sentence and paragraph levels to enhance the model’s contextual understanding and translation accuracy.In the subsequent stage, we employ p-tuning on an internal high-quality corpus of paragraph-aligned data. This step ensures that the model is finely adjusted to handle complex paragraph-level translations with greater fluency and coherence.Next, we apply the Contrastive Pretraining Objective (CPO) method, as described in the paper CPO, using a human-annotated translation corpus. This stage focuses on refining the model’s performance based on metrics evaluated at the paragraph level, emphasizing both the accuracy of the translation and the fluency of the resulting texts. The CPO method helps the model to better distinguish between subtle contextual differences, thereby improving translation quality.In the final stage, we address the importance of preserving the content structure in translations, which is crucial for the General MT test set. To achieve this, we introduce a synthetic corpus based on web pages and video subtitles, and use it during HE markup finetune training. This encourages the model to maintain the original text’s tag structure. This step ensures that the translated output retains the structural integrity of the source web pages, providing a seamless user experience.Our multi-stage approach, combining extensive pre-training, targeted fine-tuning, advanced p-tuning, and structure-preserving techniques, ensures that our model delivers high-quality, fluent, and structurally consistent translations suitable for practical applications and competitive benchmarks.

Anthology ID:: 2024.wmt-1.17
Volume:: Proceedings of the Ninth Conference on Machine Translation
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:: WMT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 247–252
Language:
URL:: https://aclanthology.org/2024.wmt-1.17
DOI:
Bibkey:
Cite (ACL):: Denis Elshin, Nikolay Karpachev, Boris Gruzdev, Ilya Golovanov, Georgy Ivanov, Alexander Antonov, Nickolay Skachkov, Ekaterina Latypova, Vladimir Layner, Ekaterina Enikeeva, Dmitry Popov, Anton Chekashev, Vladislav Negodin, Vera Frantsuzova, Alexander Chernyshev, and Kirill Denisov. 2024. From General LLM to Translation: How We Dramatically Improve Translation Quality Using Human Evaluation Data for LLM Finetuning. In Proceedings of the Ninth Conference on Machine Translation, pages 247–252, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: From General LLM to Translation: How We Dramatically Improve Translation Quality Using Human Evaluation Data for LLM Finetuning (Elshin et al., WMT 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.wmt-1.17.pdf

PDF Cite Search