Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation

Jeevanthi Liyanapathirana, Andrei Popescu-Belis


Abstract
This paper presents a solution to evaluate spoken post-editing of imperfect machine translation output by a human translator. We compare two approaches to the combination of machine translation (MT) and automatic speech recognition (ASR): a heuristic algorithm and a machine learning method. To obtain a data set with spoken post-editing information, we use the French version of TED talks as the source texts submitted to MT, and the spoken English counterparts as their corrections, which are submitted to an ASR system. We experiment with various levels of artificial ASR noise and also with a state-of-the-art ASR system. The results show that the combination of MT with ASR improves over both individual outputs of MT and ASR in terms of BLEU scores, especially when ASR performance is low.
Anthology ID:
L16-1355
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2232–2239
Language:
URL:
https://aclanthology.org/L16-1355
DOI:
Bibkey:
Cite (ACL):
Jeevanthi Liyanapathirana and Andrei Popescu-Belis. 2016. Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2232–2239, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation (Liyanapathirana & Popescu-Belis, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1355.pdf