Simulating human judgment in machine translation evaluation campaigns

Philipp Koehn


Abstract
We present a Monte Carlo model to simulate human judgments in machine translation evaluation campaigns, such as WMT or IWSLT. We use the model to compare different ranking methods and to give guidance on the number of judgments that need to be collected to obtain sufficiently significant distinctions between systems.
Anthology ID:
2012.iwslt-papers.5
Volume:
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
Month:
December 6-7
Year:
2012
Address:
Hong Kong, Table of contents
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
179–184
Language:
URL:
https://aclanthology.org/2012.iwslt-papers.5
DOI:
Bibkey:
Cite (ACL):
Philipp Koehn. 2012. Simulating human judgment in machine translation evaluation campaigns. In Proceedings of the 9th International Workshop on Spoken Language Translation: Papers, pages 179–184, Hong Kong, Table of contents.
Cite (Informal):
Simulating human judgment in machine translation evaluation campaigns (Koehn, IWSLT 2012)
Copy Citation:
PDF:
https://aclanthology.org/2012.iwslt-papers.5.pdf