Konstantin Savenkov


2024

Building on our GPT-4 LQA research in MT, this study identifies top LLMs for an LQA pipeline with up to three models. LLMs like GPT-4, GPT-4o, GPT-4 Turbo, Google Vertex, Anthropic’s Claude 3, and Llama-3 are prompted using MQM error typology. These models generate segment-wise outputs describing translation errors, scored by severity and DQF-MQM penalties. The study evaluates four language pairs: English-Spanish, English-Chinese, English-German, and English-Portuguese, using datasets from our 2024 State of MT Report across eight domains. LLM outputs are correlated with human judgments, ranking models by alignment with human assessments for penalty score, issue presence, type, and severity. This research proposes an LQA pipeline with up to three models, weighted by output quality, highlighting LLMs’ potential to enhance MT review processes and improve translation quality.

2022

In this talk, we cover the 2022 annual report on State of the Machine Translation, prepared together by Intento and e2f. The report analyses the performance of 20+ commercial MT engines across 9 industries (General, Colloquial, Education, Entertainment, Financial, Healthcare, Hospitality, IT, and Legal) and 10+ key language pairs. For the first time, this report is run using a unique dataset covering all language/domain combinations above, prepared by e2f. The presentation would focus on the process of data selection and preparation, the report methodology, principal scores to rely on when studying MT outcomes (COMET, BERTScore, PRISM, TER, and hLEPOR), and the main report outcomes (best performing MT engines for every language / domain combination). It includes a thorough comparison of the scores. It also covers language support, prices, and other features of the MT engines.

2021

Attendees will learn about how we use machine translation to provide targeted, high MT quality for content with inline tags. We offer a new and innovative approach to inserting tags into the translated text in a way that reliably preserves their quality. This process can achieve better MT quality and lower costs, as it is MT-independent, and can be used for all languages, MT engines, and use cases.

2020