PEMT human evaluation at 100x scale with risk-driven sampling

Kirill Soloviev


Abstract
Post-editing is a very common use case for Machine Translation, and human evaluation of post-edits with MQM error annotation can reveal a treasure trove of insights which help inform engine training and other quality improvement strategies. However, a manual workflow for this gets very costly very fast at enterprise scale, and those insights never get discovered nor acted upon. How can MT teams scale this process in an efficient way across dozens of languages and multiple translation tools where post-editing is done, while applying risk modeling to maximize their Return on Investment into costly Human Evaluation? We’ll share strategies learnt from our work on automating human evaluation workflows for some of the world’s best Machine Translation teams at corporates, governments, and LSPs.
Anthology ID:
2022.amta-upg.1
Volume:
Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)
Month:
September
Year:
2022
Address:
Orlando, USA
Editors:
Janice Campbell, Stephen Larocca, Jay Marciano, Konstantin Savenkov, Alex Yanishevsky
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
1–11
Language:
URL:
https://aclanthology.org/2022.amta-upg.1
DOI:
Bibkey:
Cite (ACL):
Kirill Soloviev. 2022. PEMT human evaluation at 100x scale with risk-driven sampling. In Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track), pages 1–11, Orlando, USA. Association for Machine Translation in the Americas.
Cite (Informal):
PEMT human evaluation at 100x scale with risk-driven sampling (Soloviev, AMTA 2022)
Copy Citation:
Presentation:
 2022.amta-upg.1.Presentation.pdf