Yasuyoshi Inagaki

2010

Construction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation
Masaki Murata | Tomohiro Ohno | Shigeki Matsubara | Yasuyoshi Inagaki
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

With the development of speech and language processing, speech translation systems have been developed. These studies target spoken dialogues, and employ consecutive interpretation, which uses a sentence as the translation unit. On the other hand, there exist a few researches about simultaneous interpreting, and recently, the language resources for promoting simultaneous interpreting research, such as the publication of an analytical large-scale corpus, has been prepared. For the future, it is necessary to make the corpora more practical toward realization of a simultaneous interpreting system. In this paper, we describe the construction of a bilingual corpus which can be used for simultaneous lecture interpreting research. Simultaneous lecture interpreting systems are required to recognize translation units in the middle of a sentence, and generate its translation at the proper timing. We constructed the bilingual lecture corpus by the following steps. First, we segmented sentences in the lecture data into semantically meaningful units for the simultaneous interpreting. And then, we assigned the translations to these units from the viewpoint of the simultaneous interpreting. In addition, we investigated the possibility of automatically detecting the simultaneous interpreting timing from our corpus.

2006

pdf bib abs

A Syntactically Annotated Corpus of Japanese Spoken Monologue
Tomohiro Ohno | Shigeki Matsubara | Hideki Kashioka | Naoto Kato | Yasuyoshi Inagaki
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Recently, monologue data such as lecture and commentary by professionals have been considered as valuable intellectual resources, and have been gathering attention. On the other hand, in order to use these monologue data effectively and efficiently, it is necessary for the monologue data not only just to be accumulated but also to be structured. This paper describes the construction of a Japanese spoken monologue corpus in which dependency structure is given to each utterance. Spontaneous monologue includes a lot of very long sentences composed of two or more clauses. In these sentences, there may exist the subject or the adverb common to multi-clauses, and it may be considered that the subject or adverb depend on multi-predicates. In order to give the dependency information in a real fashion, our research allows that a bunsetsu depends on multiple bunsetsus.

pdf bib

Simultaneous English-Japanese Spoken Language Translation Based on Incremental Dependency Parsing and Transfer
Koichiro Ryu | Shigeki Matsubara | Yasuyoshi Inagaki
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib

Dependency Parsing of Japanese Spoken Monologue Based on Clause Boundaries
Tomohiro Ohno | Shigeki Matsubara | Hideki Kashioka | Takehiko Maruyama | Yasuyoshi Inagaki
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib abs

Layered Speech-Act Annotation for Spoken Dialogue Corpus
Yuki Irie | Shigeki Matsubara | Nobuo Kawaguchi | Yukiko Yamaguchi | Yasuyoshi Inagaki
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the design of speech act tags for spoken dialogue corpora and its evaluation. Compared with the tags used for conventional corpus annotation, the proposed speech intention tag is specialized enough to determine system operations. However, detailed information description increases tag types. This causes an ambiguous tag selection. Therefore, we have designed an organization of tags, with focusing attention on layered tagging and context-dependent tagging. Over 35,000 utterance units in the CIAIR corpus have been tagged by hand. To evaluate the reliability of the intention tag, a tagging experiment was conducted. The reliability of tagging is evaluated by comparing the tagging among some annotators using kappa value. As a result, we confirmed that reliable data could be built. This corpus with speech intention tag could be widely used from basic research to applications of spoken dialogue. In particular, this would play an important role from the viewpoint of practical use of spoken dialogue corpora.

pdf bib abs

A Corpus Search System Utilizing Lexical Dependency Structure
Yoshihide Kato | Shigeki Matsubara | Yasuyoshi Inagaki
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents a corpus search system utilizing lexical dependency structure. The user's query consists of lexical dependency structure. The user's query consists of a sequence of keywords. For a given query, the system automatically generates the dependency structure patterns which consist of keywords in the query, and returns the sentences whose dependency structures match the generated patterns. The dependency structure patterns are generated by using two operations: combining and interpolation, which utilize dependency structures in the searched corpus. The operations enable the system to generate only the dependency structure patterns that occur in the corpus. The system achieves simple and intuitive corpus search and it is enough linguistically sophisticated to utilize structural information.

2004

pdf bib abs

An experiment on Japanese-Uighur machine translation and its evaluation
Muhtar Mahsut | Yasuhiro Ogawa | Kazue Sugino | Katsuhiko Toyama | Yasuyoshi Inagaki
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper describes an evaluation experiment about a Japanese-Uighur machine translation system which consists of verbal suffix processing, case suffix processing, phonetic change processing, and a Japanese-Uighur dictionary including about 20,000 words. Japanese and Uighur have many syntactical and language structural similarities, including word order, existence and same functions of case suffixes and verbal suffixes, morphological structure, etc. For these reasons, we can consider that we can translate Japanese into Uighur in such a manner as word-by-word aligning after morphological analysis of the input sentences without complicated syntactical analysis. From the point of view of practical usage, we have chosen three articles about environmental issue appeared in Nippon Keizai Shinbun, and conducted a translation experiment on the articles with our MT system, for clarifying our argument. Here, we have counted the correctness of phrases in the Output sentences to be evaluating criteria. As a results of the experiment, 84.8% of precision has been achieved.

pdf bib

Stochastically Evaluating the Validity of Partial Parse Trees in Incremental Parsing
Yoshihide Kato | Shigeki Matsubara | Yasuyoshi Inagaki
Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together

Japanese and Uighur languages are agglutinative languages and they have many syntactical and morphological similarities. And roughly speaking, we can translate Japanese into Uighur sequentially by replacing Japanese words with corresponding Uighur ones after morphological analysis. However, we should translate agglutinated suffixes carefully to make correct translation, because they play important roles on both languages. In this paper, we pay attention to them and propose a Japanese-Uighur machine translation utilizing the agglutinative features of both languages. To deal with the agglutinative features, we use the derivational grammar, which makes the similarities clearer between both languages. This makes our system proposed here simple and systematical. We have implemented the machine translation system and evaluated how effectively our system works.