2024
pdf
bib
abs
Guiding In-Context Learning of LLMs through Quality Estimation for Machine Translation
Javad Pourmostafa Roshan Sharami
|
Dimitar Shterionov
|
Pieter Spronck
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
The quality of output from large language models (LLMs), particularly in machine translation (MT), is closely tied to the quality of in-context examples (ICEs) provided along with the query, i.e., the text to translate. The effectiveness of these ICEs is influenced by various factors, such as the domain of the source text, the order in which the ICEs are presented, the number of these examples, and the prompt templates used. Naturally, selecting the most impactful ICEs depends on understanding how these affect the resulting translation quality, which ultimately relies on translation references or human judgment. This paper presents a novel methodology for in-context learning (ICL) that relies on a search algorithm guided by domain-specific quality estimation (QE). Leveraging the XGLM model, our methodology estimates the resulting translation quality without the need for translation references, selecting effective ICEs for MT to maximize translation quality. Our results demonstrate significant improvements over existing ICL methods and higher translation performance compared to fine-tuning a pre-trained language model (PLM), specifically mBART-50.
2023
pdf
bib
abs
Tailoring Domain Adaptation for Machine Translation Quality Estimation
Javad Pourmostafa Roshan Sharami
|
Dimitar Shterionov
|
Frédéric Blain
|
Eva Vanmassenhove
|
Mirella De Sisto
|
Chris Emmery
|
Pieter Spronck
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high-cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizabile, i.e., they should be able to handle data from different domains, both generic and specific. To alleviate these two main issues — data scarcity and domain mismatch — this paper combines domain adaptation and data augmentation within a robust QE system. Our method is to first train a generic QE model and then fine-tune it on a specific domain while retaining generic knowledge. Our results show a significant improvement for all the language pairs investigated, better cross-lingual inference, and a superior performance in zero-shot learning scenarios as compared to state-of-the-art baselines.
pdf
bib
A Python Tool for Selecting Domain-Specific Data in Machine Translation
Javad Pourmostafa Roshan Sharami
|
Dimitar Shterionov
|
Pieter Spronck
Proceedings of the 1st Workshop on Open Community-Driven Machine Translation
2022
pdf
bib
abs
A Quality Estimation and Quality Evaluation Tool for the Translation Industry
Elena Murgolo
|
Javad Pourmostafa Roshan Sharami
|
Dimitar Shterionov
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
With the increase in machine translation (MT) quality over the latest years, it has now become a common practice to integrate MT in the workflow of language service providers (LSPs) and other actors in the translation industry. With MT having a direct impact on the translation workflow, it is important not only to use high-quality MT systems, but also to understand the quality dimension so that the humans involved in the translation workflow can make informed decisions. The evaluation and monitoring of MT output quality has become one of the essential aspects of language technology management in LSPs’ workflows. First, a general practice is to carry out human tests to evaluate MT output quality before deployment. Second, a quality estimate of the translated text, thus after deployment, can inform post editors or even represent post-editing effort. In the former case, based on the quality assessment of a candidate engine, an informed decision can be made whether the engine would be deployed for production or not. In the latter, a quality estimate of the translation output can guide the human post-editor or even make rough approximations of the post-editing effort. Quality of an MT engine can be assessed on document- or on sentence-level. A tool to jointly provide all these functionalities does not exist yet. The overall objective of the project presented in this paper is to develop an MT quality assessment (MTQA) tool that simplifies the quality assessment of MT engines, combining quality evaluation and quality estimation on document- and sentence- level.