Chinonso Osuji


2025

In this paper, we present our submission to the Tenth Conference on Machine Translation (WMT25) Shared Task on Automated Translation Quality Evaluation. Our systems are built upon the COMET framework and trained to predict segment-level ESA scores using augmented long-context data. To construct long-context training examples, we concatenate multiple in-domain sentences and compute a weighted average of their scores. We further integrate human judgment datasets MQM, SQM, and DA) through score normalisation and train multilingual models on the source, hypothesis, and reference translations. Experimental results demonstrate that incorporating long-context information yields higher correlations with human judgments compared to models trained exclusively on short segments.