Word Alignment Annotation in a Japanese-Chinese Parallel Corpus

Yujie Zhang, Zhulong Wang, Kiyotaka Uchimoto, Qing Ma, Hitoshi Isahara


Abstract
Parallel corpora are critical resources for machine translation research and development since parallel corpora contain translation equivalences of various granularities. Manual annotation of word & phrase alignment is of significance to provide gold-standard for developing and evaluating both example-based machine translation model and statistical machine translation model. This paper presents the work of word & phrase alignment annotation in the NICT Japanese-Chinese parallel corpus, which is constructed at the National Institute of Information and Communications Technology (NICT). We describe the specification of word alignment annotation and the tools specially developed for the manual annotation. The manual annotation on 17,000 sentence pairs has been completed. We examined the manually annotated word alignment data and extracted translation knowledge from the word & phrase aligned corpus.
Anthology ID:
L08-1569
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/259_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Yujie Zhang, Zhulong Wang, Kiyotaka Uchimoto, Qing Ma, and Hitoshi Isahara. 2008. Word Alignment Annotation in a Japanese-Chinese Parallel Corpus. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Word Alignment Annotation in a Japanese-Chinese Parallel Corpus (Zhang et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/259_paper.pdf