Annotating data selection for improving machine translation

Keiji Yasuda, Hideo Okuma, Masao Utiyama, Eiichiro Sumita


Abstract
In order to efficiently improve machine translation systems, we propose a method which selects data to be annotated (manually translated) from speech-to-speech translation field data. For the selection experiments, we used data from field experiments conducted during the 2009 fiscal year in five areas of Japan. For the selection experiments, we used data sets from two areas: one data set giving the lowest baseline speech translation performance for its test set, and another data set giving the highest. In the experiments, we compare two methods for selecting data to be manually translated from the field data. Both of them use source side language models for data selection, but in different manners. According to the experimental results, either or both of the methods show larger improvements compared to a random data selection.
Anthology ID:
2011.iwslt-papers.11
Volume:
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers
Month:
December 8-9
Year:
2011
Address:
San Francisco, California
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
269–274
Language:
URL:
https://aclanthology.org/2011.iwslt-papers.11
DOI:
Bibkey:
Cite (ACL):
Keiji Yasuda, Hideo Okuma, Masao Utiyama, and Eiichiro Sumita. 2011. Annotating data selection for improving machine translation. In Proceedings of the 8th International Workshop on Spoken Language Translation: Papers, pages 269–274, San Francisco, California.
Cite (Informal):
Annotating data selection for improving machine translation (Yasuda et al., IWSLT 2011)
Copy Citation:
PDF:
https://aclanthology.org/2011.iwslt-papers.11.pdf