Kyoko Kanzaki


2020

2019

This paper describes our project on Japanese compound verbs. Japanese “Verb (adnominal form) + Verb” compounds, which are treated as single verbs, frequently appear in daily communication. They are not sufficiently registered in Japanese dictionaries or thesauri. We are now compiling a list of the synonymous expressions of compound verbs in “compound verb lexicon” built by the National Institute of Japanese Language and Linguistics. We extracted synonymous words and phrases of compound verbs from five hundred million Japanese web corpora. As a result, synonymous expressions of 1800 compound verbs were obtained automatically among 2700 in the “compound verb lexicon”. From our data, we observed that some compound verbs represent not only motion but also additional nuances such as an emotional one. In order to reflect the abundant meanings that compound verbs own, we will try to think of a link of synonymous expressions to Japanese wordnet. Concretely, in the case of synonymous phrases, we try to link adverbial expressions which are a part of phrases to the adverbial synset in Japanese wordnet.

2018

2014

2009

2008

Particularly considering the requirement of high reliability, we argue that the most appropriate architecture for a medical speech translator that can be realised using today’s technology combines unidirectional (doctor to patient) translation, medium-vocabulary controlled language coverage, interlingua-based translation, an embedded help component, and deployability on a hand-held hardware platform. We present an overview of the Open Source MedSLT prototype, which has been developed in accordance with these design principles. The system is implemented on top of the Regulus and Nuance 8.5 platforms, translates patient examination questions for all language pairs in the set {English, French, Japanese, Arabic, Catalan}, using vocabularies of about 400 to 1 100 words, and can be run in a distributed client/server environment, where the client application is hosted on a Nokia Internet Tablet device.
In this paper we describe the construction of an illustrated Japanese Wordnet. We bootstrap the Wordnet using existing multiple existing wordnets in order to deal with the ambiguity inherent in translation. We illustrate it with pictures from the Open Clip Art Library.
We outline work performed within the framework of a current EC project. The goal is to construct a language-independent information system for a specific domain (environment/ecology/biodiversity) anchored in a language-independent ontology that is linked to wordnets in seven languages. For each language, information extraction and identification of lexicalized concepts with ontological entries is carried out by text miners (“Kybots”). The mapping of language-specific lexemes to the ontology allows for crosslinguistic identification and translation of equivalent terms. The infrastructure developed within this project enables long-range knowledge sharing and transfer across many languages and cultures, addressing the need for global and uniform transition of knowledge beyond the specific domains addressed here.
After a long history of compilation of our own lexical resources, EDR Japanese/English Electronic Dictionary, and discussions with major players on development of various WordNets, Japanese National Institute of Information and Communications Technology started developing the Japanese WordNet in 2006 and will publicly release the first version, which includes both the synset in Japanese and the annotated Japanese corpus of SemCor, in June 2008. As the first step in compiling the Japanese WordNet, we added Japanese equivalents to synsets of the Princeton WordNet. Of course, we must also add some synsets which do not exist in the Princeton WordNet, and must modify synsets in the Princeton WordNet, in order to make the hierarchical structure of Princeton synsets represent thesaurus-like information found in the Japanese language, however, we will address these tasks in a future study. We then translated English sentences which are used in the SemCor annotation into Japanese and annotated them using our Japanese WordNet. This article describes the overview of our project to compile Japanese WordNet and other resources which relate to our Japanese WordNet.
We describe recent work on MedSLT, a medium-vocabulary interlingua-based medical speech translation system, focussing on issues that arise when handling languages of which the grammar engineer has little or no knowledge. We show how we can systematically create and maintain multiple forms of grammars, lexica and interlingual representations, with some versions being used by language informants, and some by grammar engineers. In particular, we describe the advantages of structuring the interlingua definition as a simple semantic grammar, which includes a human-readable surface form. We show how this allows us to rationalise the process of evaluating translations between languages lacking common speakers, and also makes it possible to create a simple generic tool for debugging to-interlingua translation rules. Examples presented focus on the concrete case of translation between Japanese and Arabic in both directions.
We describe various syntactic and semantic conditions for finding abstractnouns which refer to concepts of adjectives from a text, in an attempt to explore the creation of a thesaurus from text. Depending on usages, six kinds of syntactic patterns are shown. In the syntactic and semantic conditions an omission of an abstract noun is mainly used, but in addition, various linguistic clues are needed. We then compare our results with synsets of Japanese WordNet. From a viewpoint of Japanese WordNet, the degree of agreement of ?Attribute? between our data and Japanese WordNet was 22%. On the other hand, the total number of differences of obtained abstract nouns was 267. From a viewpoint of our data,the degree of agreement of abstract nouns between our data and Japanese WordNet was 54%.

2006

Aiming to compile a thesaurus of adjectives, we discuss how to extract abstract nouns categorizing adjectives, clarify the semantic and syntactic functions of these abstract nouns, and manually evaluate the capability to extract the “instance-category” relations. We focused on some Japanese syntactic structures and utilized possibility of omission of abstract noun to decide whether or not a semantic relation between an adjective and an abstract noun is an “instance-category” relation. For 63% of the adjectives (57 groups/90 groups) in our experiments, our extracted categories were found to be most suitable. For 22 % of the adjectives (20/90), the categories in the EDR lexicon were found to be most suitable. For 14% of the adjectives (13/90), neither our extracted categories nor those in EDR were found to be suitable, or examinees’ own categories were considered to be more suitable. From our experimental results, we found that the correspondence between a group of adjectives and their category name was more suitable in our method than in the EDR lexicon.
The EDR electronic dictionary is a machine-tractable dictionary developed for advanced computer-based processing of natural lan-guage. This dictionary comprises eleven sub-dictionaries, including a concept dictionary, word dictionaries, bilingual dictionaries, co-occurrence dictionaries, and a technical terminology dictionary. In this study, we focus on the concept dictionary and aim to revise the arrangement of concepts for improving the EDR electronic dictionary. We believe that unsuitable concepts in a class differ from other concepts in the same class from an abstract perspective. From this notion, we first try to automatically extract those concepts unsuited to the class. We then try semi-automatically to amend the concept explications used to explain the meanings to human users and rearrange them in suitable classes. In the experiment, we try to revise those concepts that are the lower-concepts of the concept “human” in the concept hierarchy and that are directly arranged under concepts with concept explications such as “person as defined by –” and “person viewed from –.” We analyze the result and evaluate our approach.

2005

In this paper, we present evidence that providing users of a speech to speech translation system for emergency diagnosis (MedSLT) with a tool that helps them to learn the coverage greatly improves their success in using the system. In MedSLT, the system uses a grammar-based recogniser that provides more predictable results to the translation component. The help module aims at addressing the lack of robustness inherent in this type of approach. It takes as input the result of a robust statistical recogniser that performs better for out-of-coverage data and produces a list of in-coverage example sentences. These examples are selected from a defined list using a heuristic that prioritises sentences maximising the number of N-grams shared with those extracted from the recognition result.

2004

2003

2002

2000

1999