SubCo: A Learner Translation Corpus of Human and Machine Subtitles

José Manuel Martínez Martínez, Mihaela Vela


Abstract
In this paper, we present a freely available corpus of human and automatic translations of subtitles. The corpus comprises, the original English subtitles (SRC), both human (HT) and machine translations (MT) into German, as well as post-editions (PE) of the MT output. HT and MT are annotated with errors. Moreover, human evaluation is included in HT, MT, and PE. Such a corpus is a valuable resource for both human and machine translation communities, enabling the direct comparison – in terms of errors and evaluation – between human and machine translations and post-edited machine translations.
Anthology ID:
L16-1357
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2246–2254
Language:
URL:
https://aclanthology.org/L16-1357
DOI:
Bibkey:
Cite (ACL):
José Manuel Martínez Martínez and Mihaela Vela. 2016. SubCo: A Learner Translation Corpus of Human and Machine Subtitles. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2246–2254, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
SubCo: A Learner Translation Corpus of Human and Machine Subtitles (Martínez & Vela, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1357.pdf