Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data

Zhongtao Liu; Parker Riley; Daniel Deutsch; Alison Lui; Mengmeng Niu; Apurva Shah; Markus Freitag

Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data

Zhongtao Liu, Parker Riley, Daniel Deutsch, Alison Lui, Mengmeng Niu, Apurva Shah, Markus Freitag

Abstract

Collecting high-quality translations is crucial for the development and evaluation of machine translation systems. However, traditional human-only approaches are costly and slow. This study presents a comprehensive investigation of 11 approaches for acquiring translation data, including human-only, machine-only, and hybrid approaches. Our findings demonstrate that human-machine collaboration can match or even exceed the quality of human-only translations, while being more cost-efficient. Error analysis reveals the complementary strengths between human and machine contributions, highlighting the effectiveness of collaborative methods. Cost analysis further demonstrates the economic benefits of human-machine collaboration methods, with some approaches achieving top-tier quality at around 60% of the cost of traditional methods. We release a publicly available dataset containing nearly 18,000 segments of varying translation quality with corresponding human ratings to facilitate future research.

Anthology ID:: 2024.wmt-1.110
Volume:: Proceedings of the Ninth Conference on Machine Translation
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:: WMT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1095–1106
Language:
URL:: https://aclanthology.org/2024.wmt-1.110
DOI:
Bibkey:
Cite (ACL):: Zhongtao Liu, Parker Riley, Daniel Deutsch, Alison Lui, Mengmeng Niu, Apurva Shah, and Markus Freitag. 2024. Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data. In Proceedings of the Ninth Conference on Machine Translation, pages 1095–1106, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data (Liu et al., WMT 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.wmt-1.110.pdf

PDF Cite Search