João Godinho


pdf bib
Quality Fit for Purpose: Building Business Critical Errors Test Suites
Mariana Cabeça | Marianna Buchicchio | Madalena Gonçalves | Christine Maroti | João Godinho | Pedro Coelho | Helena Moniz | Alon Lavie
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

This paper illustrates a new methodology based on Test Suites (Avramidis et al., 2018) with focus on Business Critical Errors (BCEs) (Stewart et al., 2022) to evaluate the output of Machine Translation (MT) and Quality Estimation (QE) systems. We demonstrate the value of relying on semi-automatic evaluation done through scalable BCE-focused Test Suites to monitor both MT and QE systems’ performance for 8 language pairs (LPs) and a total of 4 error categories. This approach allows us to not only track the impact of new features and implementations in a real business environment, but also to identify strengths and weaknesses in models regarding different error types, and subsequently know what to improve henceforth.