Tsuyoshi Okita

Annotated Corpora for Word Alignment between Japanese and English and its Evaluation with MAP-based Word Aligner
Tsuyoshi Okita
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents two annotated corpora for word alignment between Japanese and English. We annotated on top of the IWSLT-2006 and the NTCIR-8 corpora. The IWSLT-2006 corpus is in the domain of travel conversation while the NTCIR-8 corpus is in the domain of patent. We annotated the first 500 sentence pairs from the IWSLT-2006 corpus and the first 100 sentence pairs from the NTCIR-8 corpus. After mentioned the annotation guideline, we present two evaluation algorithms how to use such hand-annotated corpora: although one is a well-known algorithm for word alignment researchers, one is novel which intends to evaluate a MAP-based word aligner of Okita et al. (2010b).

pdf bib

Workshop on Monolingual Machine Translation
Tsuyoshi Okita | Artem Sokolov | Taro Watanabe
Workshop on Monolingual Machine Translation

2010

pdf bib

Multi-Word Expression-Sensitive Word Alignment
Tsuyoshi Okita | Alfredo Maldonado Guerra | Yvette Graham | Andy Way
Proceedings of the 4th Workshop on Cross Lingual Information Access

2009

pdf bib

Data Cleaning for Word Alignment
Tsuyoshi Okita
Proceedings of the ACL-IJCNLP 2009 Student Research Workshop

pdf bib abs

Low-resource machine translation using MaTrEx
Yanjun Ma | Tsuyoshi Okita | Özlem Çetinoğlu | Jinhua Du | Andy Way
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we give a description of the Machine Translation (MT) system developed at DCU that was used for our fourth participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2009). Two techniques are deployed in our system in order to improve the translation quality in a low-resource scenario. The first technique is to use multiple segmentations in MT training and to utilise word lattices in decoding stage. The second technique is used to select the optimal training data that can be used to build MT systems. In this year’s participation, we use three different prototype SMT systems, and the output from each system are combined using standard system combination method. Our system is the top system for Chinese–English CHALLENGE task in terms of BLEU score.

Co-authors

Venues

Tsuyoshi Okita

2015

2014

2013

2012

2010

2009

Co-authors

Venues