Vasiliy Viskov
2024
xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics
Daniil Larionov
|
Mikhail Seleznyov
|
Vasiliy Viskov
|
Alexander Panchenko
|
Steffen Eger
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
State-of-the-art trainable machine translation evaluation metrics like xCOMET achieve high correlation with human judgment but rely on large encoders (up to 10.7B parameters), making them computationally expensive and inaccessible to researchers with limited resources. To address this issue, we investigate whether the knowledge stored in these large encoders can be compressed while maintaining quality. We employ distillation, quantization, and pruning techniques to create efficient xCOMET alternatives and introduce a novel data collection pipeline for efficient black-box distillation. Our experiments show that, using quantization, xCOMET can be compressed up to three times with no quality degradation. Additionally, through distillation, we create an 278M-sized xCOMET-lite metric, which has only 2.6% of xCOMET-XXL parameters, but retains 92.1% of its quality. Besides, it surpasses strong small-scale metrics like COMET-22 and BLEURT-20 on the WMT22 metrics challenge dataset by 6.4%, despite using 50% fewer parameters. All code, dataset, and models are available online.
2023
Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation
Daniil Larionov
|
Vasiliy Viskov
|
George Kokush
|
Alexander Panchenko
|
Steffen Eger
Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems
In this paper, we propose a retrieval-augmented in-context learning for natural language generation (NLG) evaluation. This method allows practitioners to utilize large language models (LLMs) for various NLG evaluation tasks without any fine-tuning. We apply our approach to Eval4NLP 2023 Shared Task in translation evaluation and summarization evaluation subtasks. The findings suggest that retrieval-augmented in-context learning is a promising approach for creating LLM-based evaluation metrics for NLG. Further research directions include exploring the performance of various publicly available LLM models and identifying which LLM properties help boost the quality of the metric.
Semantically-Informed Regressive Encoder Score
Vasiliy Viskov
|
George Kokush
|
Daniil Larionov
|
Steffen Eger
|
Alexander Panchenko
Proceedings of the Eighth Conference on Machine Translation
Machine translation is natural language generation (NLG) problem of translating source text from one language to another. As every task in machine learning domain it requires to have evaluation metric. The most obvious one is human evaluation but it is expensive in case of money and time consumption. In last years with appearing of pretrained transformer architectures and large language models (LLMs) state-of-the-art results in automatic machine translation evaluation got a huge quality step in terms of correlation with expert assessment. We introduce MRE-Score, seMantically-informed Regression Encoder Score, the approach with constructing automatic machine translation evaluation system based on regression encoder and contrastive pretraining for downstream problem.