This paper claims that constructing a dictionary using bilingual pairs obtained from parallel corpora needs not only correct alignment of two noun phrases but also judgment of its appropriateness as an entry. It specifically addresses the latter task, which has been paid little attention. It demonstrates a method of selecting a suitable entry using Support Vector Machines, and proposes to regard as the features the common and the different parts between a current translation and a new translation. Using experiment results, this paper examines how selection performances are affected by the four ways of representing the common and the different parts: morphemes, parts of speech, semantic markers, and upper-level semantic markers. Moreover, we used n-grams of the common and the different parts of above four kinds of features. Experimental result found that representation by morphemes marked the best performance, F-measure of 0.803.
This paper reports the result of our experiment, the aim of which is to examine the efficiency of reading support systems such as a sentence-machine translation system, a word-machine translation system, and so on. Our evaluation method used in the experiment is able to handle the different reading support systems by assessing the usability of the systems, i.e., comprehension, reading speed, and effective speed. The result shows that the reading-speed procedure is able to evaluate the support systems as well as the comprehension-based procedure proposed by Ohguro (1993) and Fuji et al. (2001).
Since the headlines of English news articles have a characteristic style, different from the styles which prevail in ordinary sentences, it is difficult for MT systems to generate high quality translations for headlines. We try to solve this problem by adding to an existing system a preediting module which rewrites the headlines to ordinary expressions. Rewriting of headlines makes it possible to generate better translations which would not otherwise be generated, with little or no changes to the existing parts of the system. Focusing on the absence of a form of the verb of 'be', we have described rewriting rules for putting properly the verb 'be' into the headlines.