Translation Using JAPIO Patent Corpora: JAPIO at WAT2016

Satoshi Kinoshita, Tadaaki Oshio, Tomoharu Mitsuhashi, Terumasa Ehara


Abstract
We participate in scientific paper subtask (ASPEC-EJ/CJ) and patent subtask (JPC-EJ/CJ/KJ) with phrase-based SMT systems which are trained with its own patent corpora. Using larger corpora than those prepared by the workshop organizer, we achieved higher BLEU scores than most participants in EJ and CJ translations of patent subtask, but in crowdsourcing evaluation, our EJ translation, which is best in all automatic evaluations, received a very poor score. In scientific paper subtask, our translations are given lower scores than most translations that are produced by translation engines trained with the in-domain corpora. But our scores are higher than those of general-purpose RBMTs and online services. Considering the result of crowdsourcing evaluation, it shows a possibility that CJ SMT system trained with a large patent corpus translates non-patent technical documents at a practical level.
Anthology ID:
W16-4612
Volume:
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venues:
WAT | WS
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
133–138
Language:
URL:
https://aclanthology.org/W16-4612
DOI:
Bibkey:
Cite (ACL):
Satoshi Kinoshita, Tadaaki Oshio, Tomoharu Mitsuhashi, and Terumasa Ehara. 2016. Translation Using JAPIO Patent Corpora: JAPIO at WAT2016. In Proceedings of the 3rd Workshop on Asian Translation (WAT2016), pages 133–138, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Translation Using JAPIO Patent Corpora: JAPIO at WAT2016 (Kinoshita et al., 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4612.pdf
Data
ASPEC