Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation

Wajdi Zaghouani, Nizar Habash, Ossama Obeid, Behrang Mohit, Houda Bouamor, Kemal Oflazer


Abstract
We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic. Our overarching goal is to use the annotated corpus to develop automatic machine translation post-editing systems for Arabic that can be used to help accelerate the human revision process of translated texts. The creation of any manually annotated corpus usually presents many challenges. In order to address these challenges, we created comprehensive and simplified annotation guidelines which were used by a team of five annotators and one lead annotator. In order to ensure a high annotation agreement between the annotators, multiple training sessions were held and regular inter-annotator agreement measures were performed to check the annotation quality. The created corpus of manual post-edited translations of English to Arabic articles is the largest to date for this language pair.
Anthology ID:
L16-1295
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1869–1876
Language:
URL:
https://aclanthology.org/L16-1295
DOI:
Bibkey:
Cite (ACL):
Wajdi Zaghouani, Nizar Habash, Ossama Obeid, Behrang Mohit, Houda Bouamor, and Kemal Oflazer. 2016. Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1869–1876, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation (Zaghouani et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1295.pdf