MS-COMET: More and Better Human Judgements Improve Metric Performance

Tom Kocmi; Hitokazu Matsushita; Christian Federmann

doi:10.18653/v1/2022.wmt-1.47

MS-COMET: More and Better Human Judgements Improve Metric Performance

Tom Kocmi, Hitokazu Matsushita, Christian Federmann

Abstract

We develop two new metrics that build on top of the COMET architecture. The main contribution is collecting a ten-times larger corpus of human judgements than COMET and investigating how to filter out problematic human judgements. We propose filtering human judgements where human reference is statistically worse than machine translation. Furthermore, we average scores of all equal segments evaluated multiple times. The results comparing automatic metrics on source-based DA and MQM-style human judgement show state-of-the-art performance on a system-level pair-wise system ranking. We release both of our metrics for public use.

Anthology ID:: 2022.wmt-1.47
Volume:: Proceedings of the Seventh Conference on Machine Translation (WMT)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 541–548
Language:
URL:: https://aclanthology.org/2022.wmt-1.47/
DOI:: 10.18653/v1/2022.wmt-1.47
Bibkey:
Cite (ACL):: Tom Kocmi, Hitokazu Matsushita, and Christian Federmann. 2022. MS-COMET: More and Better Human Judgements Improve Metric Performance. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 541–548, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: MS-COMET: More and Better Human Judgements Improve Metric Performance (Kocmi et al., WMT 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.wmt-1.47.pdf

PDF Cite Search Fix data