Kirill Soloviev
2022
PEMT human evaluation at 100x scale with risk-driven sampling
Kirill Soloviev
Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)
Post-editing is a very common use case for Machine Translation, and human evaluation of post-edits with MQM error annotation can reveal a treasure trove of insights which help inform engine training and other quality improvement strategies. However, a manual workflow for this gets very costly very fast at enterprise scale, and those insights never get discovered nor acted upon. How can MT teams scale this process in an efficient way across dozens of languages and multiple translation tools where post-editing is done, while applying risk modeling to maximize their Return on Investment into costly Human Evaluation? We’ll share strategies learnt from our work on automating human evaluation workflows for some of the world’s best Machine Translation teams at corporates, governments, and LSPs.