Takeshi Abekawa


2016

In this paper, we discuss our ongoing efforts to construct a scientific paper browsing system that helps users to read and understand advanced technical content distributed in PDF. Since PDF is a format specifically designed for printing, layout and logical structures of documents are indistinguishably embedded in the file. It requires much effort to extract natural language text from PDF files, and reversely, display semantic annotations produced by NLP tools on the original page layout. In our browsing system, we tackle these issues caused by the gap between printable document and plain text. Our system provides ways to extract natural language sentences from PDF files together with their logical structures, and also to map arbitrary textual spans to their corresponding regions on page images. We setup a demonstration system using papers published in ACL anthology and demonstrate the enhanced search and refined recommendation functions which we plan to make widely available to NLP researchers.

2012

2011

2010

In this paper we report a way of constructing a translation corpus that contains not only source and target texts, but draft and final versions of target texts, through the translation hosting site Minna no Hon'yaku (MNH). We made MNH publicly available on April 2009. Since then, more than 1,000 users have registered and over 3,500 documents have been translated, as of February 2010, from English to Japanese and from Japanese to English. MNH provides an integrated translation-aid environment, QRedit, which enables translators to look up high-quality dictionaries and Wikipedia as well as to search Google seamlessly. As MNH keeps translation logs, a corpus consisting of source texts, draft translations in several versions, and final translations is constructed naturally through MNH. As of 7 February, 764 documents with multiple translation versions are accumulated, of which 110 are edited by more than one translators. This corpus can be used for self-learning by inexperienced translators on MNH, and potentially for improving machine translation.

2009

2008

In human translation, translators first make draft translations and then modify and edit them. In the case of experienced translators, this process involves the use of wide-ranging expert knowledge, which has mostly remained implicit so far. Describing the difference between draft and final translations, therefore, should contribute to making this knowledge explicit. If we could clarify the expert knowledge of translators, hopefully in a computationally tractable way, we would be able to contribute to the automatic notification of awkward translations to assist inexperienced translators, improving the quality of MT output, etc. Against this backdrop, we have started constructing a corpus that indicates patterns of modification between draft and final translations made by human translators. This paper reports on our progress to date.

2007

2006

2005