Toward the Evaluation of Machine Translation Using Patent Information

Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, Takehito Utsuro


Abstract
To aid research and development in machine translation, we have produced a test collection for Japanese/English machine translation. To obtain a parallel corpus, we extracted patent documents for the same or related inventions published in Japan and the United States. Our test collection includes approximately 2000000 sentence pairs in Japanese and English, which were extracted automatically from our parallel corpus. These sentence pairs can be used to train and evaluate machine translation systems. Our test collection also includes search topics for cross-lingual patent retrieval, which can be used to evaluate the contribution of machine translation to retrieving patent documents across languages. This paper describes our test collection, methods for evaluating machine translation, and preliminary experiments.
Anthology ID:
2008.amta-papers.8
Volume:
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:
October 21-25
Year:
2008
Address:
Waikiki, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
97–106
Language:
URL:
https://aclanthology.org/2008.amta-papers.8
DOI:
Bibkey:
Cite (ACL):
Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, and Takehito Utsuro. 2008. Toward the Evaluation of Machine Translation Using Patent Information. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 97–106, Waikiki, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Toward the Evaluation of Machine Translation Using Patent Information (Fujii et al., AMTA 2008)
Copy Citation:
PDF:
https://aclanthology.org/2008.amta-papers.8.pdf