Akira Fujita


2016

pdf bib
Translation Errors and Incomprehensibility: a Case Study using Machine-Translated Second Language Proficiency Tests
Takuya Matsuzaki | Akira Fujita | Naoya Todo | Noriko H. Arai
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper reports on an experiment where 795 human participants answered to the questions taken from second language proficiency tests that were translated to their native language. The output of three machine translation systems and two different human translations were used as the test material. We classified the translation errors in the questions according to an error taxonomy and analyzed the participants’ response on the basis of the type and frequency of the translation errors. Through the analysis, we identified several types of errors that deteriorated most the accuracy of the participants’ answers, their confidence on the answers, and their overall evaluation of the translation quality.

2015

pdf bib
Evaluating Machine Translation Systems with Second Language Proficiency Tests
Takuya Matsuzaki | Akira Fujita | Naoya Todo | Noriko H. Arai
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Overview of Todai Robot Project and Evaluation Framework of its NLP-based Problem Solving
Akira Fujita | Akihiro Kameda | Ai Kawazoe | Yusuke Miyao
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We introduce the organization of the Todai Robot Project and discuss its achievements. The Todai Robot Project task focuses on benchmarking NLP systems for problem solving. This task encourages NLP-based systems to solve real high-school examinations. We describe the details of the method to manage question resources and their correct answers, answering tools and participation by researchers in the task. We also analyse the answering accuracy of the developed systems by comparing the systems’ answers with answers given by human test-takers.