Hozumi Tanaka

2005

Evaluation of a Japanese CFG Derived from a Syntactically Annotated Corpus with Respect to Dependency Measures
Tomoya Noro | Chimato Koike | Taiichi Hashimoto | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of the Fifth Workshop on Asian Language Resources (ALR-05) and First Symposium on Asian Language Resources Network (ALRN)

pdf bib

eBonsai: An Integrated Environment for Annotating Treebanks
Hiroshi Ichikawa | Masaki Noguchi | Taiichi Hashimoto | Takenobu Tokunaga | Hozumi Tanaka
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

2004

pdf bib

A hybrid back-transliteration system for Japanese
Slaven Bilac | Hozumi Tanaka
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib

Evaluating the FOKS Error Model
Slaven Bilac | Timothy Baldwin | Hozumi Tanaka
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib abs

Retrieving Annotated Corpora for Corpus Annotation
Kyôsuke Yoshida | Taiichi Hashimoto | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

This paper introduces a tool \Bonsai which supports human in annotating corpora with morphosyntactic information, and in retrieving syntactic structures stored in the database. Integrating annotation and retrieval enables users to annotate a new instance while looking back at the already annotated sentences which share the similar morphosyntactic structure. We focus on the retrieval part of the system, and describe a method to decompose a large input query into smaller ones in order to gain retrieval efficiency. The proposed method is evaluated with the Penn Treebank corpus, showing significant improvements.

In Japanese constructions of the form [N1 no Adj N2], the adjective Adj modifies either N1 or N2. Determing the semantic dependencies of adjective in such phrase is an important task for machine translation. This paper describes a method for determining the adjective dependency in such constructions using decision lists, and inducing decision lists from training contexts with correct semantic dependencies and without. Based on evaluation, our method is able to determine adjective dependency with an precision of about 94%. We further analyze rules in the induced decision lists and examine effective features to determine the semantic dependencies of adjectives.

pdf bib

The Japanese Translation Task: Lexical and Structural Perspectives
Timothy Baldwin | Atsushi Okazaki | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

2000

pdf bib

The Effects of Word Order and Segmentation on Translation Retrieval Performance
Timothy Baldwin | Hozumi Tanaka
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib

Semi-automatic Construction of a Tree-annotated Corpus Using an Iterative Learning Statistical Language Model
Kiyoaki Shirai | Hozumi Tanaka | Takenobu Tokunaga
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib

Verb Alternations and Japanese : How, What and Where
Timothy Baldwin | Hozumi Tanaka
Proceedings of the 14th Pacific Asia Conference on Language, Information and Computation

1999

pdf bib

What should we do next for MT system development?
Hozumi Tanaka
Proceedings of Machine Translation Summit VII

pdf bib

The applications of unsupervised learning to Japanese grapheme-phoneme alignment
Timothy Baldwin | Hozumi Tanaka
Unsupervised Learning in Natural Language Processing

pdf bib

Argument status in Japanese verb sense disambiguation
Timothy Baldwin | Hozumi Tanaka
Proceedings of the 8th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

pdf bib abs

Sharing syntactic structures
Masahiro Ueki | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of Machine Translation Summit VII

Bracketed corpora are a very useful resource for natural language processing, but hard to build efficiently, leading to quantitative insufficiency for practical use. Disparities in morphological information, such as word segmentation and part-of-speech tag sets, are also troublesome. An application specific to a particular corpus often cannot be applied to another corpus. In this paper, we sketch out a method to build a corpus that has a fixed syntactic structure but varying morphological annotation based on the different tag set schemes utilized. Our system uses a two layered grammar, one layer of which is made up of replaceable tag-set-dependent rules while the other has no such tag set dependency. The input sentences of our system are bracketed corresponding to structural information of corpus. The parser can work using any tag set and grammar, and using the same input bracketing, we obtain corpus that shares partial syntactic structure.

pdf bib

Complementing WordNet with Roget’s and Corpus-based Thesauri for Information Retrieval
Rila Mandala | Takenobu Tokunaga | Hozumi Tanaka
Ninth Conference of the European Chapter of the Association for Computational Linguistics

1998

pdf bib

Selective Sampling for Example-based Word Sense Disambiguation
Atsushi Fujii | Kentaro Inui | Takenobu Tokunaga | Hozumi Tanaka
Computational Linguistics, Volume 24, Number 4, December 1998

pdf bib

A Method of Incorporating Bigram Constraints into an LR Table and Its Effectiveness in Natural Language Processing
Hiroki Imai | Hozumi Tanaka
New Methods in Language Processing and Computational Natural Language Learning

pdf bib

The Use of WordNet in Information Retrieval
Mandala Rila | Takenobu Tokunaga | Hozumi Tanaka
Usage of WordNet in Natural Language Processing Systems

pdf bib

An Empirical Evaluation on Statistical Parsing of Japanese Sentences Using Lexical Association Statistics
Kiyoaki Shirai | Kentaro Inui | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of the Third Conference on Empirical Methods for Natural Language Processing

1997

pdf bib

Incorporating Bigram Constraints into an LR Table
Hiroki Imai | Hui Li | Hozumi Tanaka
Proceedings of the 10th Research on Computational Linguistics International Conference

pdf bib

Extending a thesaurus by classifying words
Takenobu Tokunaga | Atsushi Fujii | Naoyuki Sakurai | Hozumi Tanaka
Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications

pdf bib abs

MT R&D in Asia
Hozumi Tanaka
Proceedings of Machine Translation Summit VI: Papers

There is a big shift in MT R&D in this region after many large-scale projects conducted in the past ten years. Multi-lingual Machine Translation (MMT) project is one of the significant R&D projects that increased a great number of NLP related researchers and research activities which can be seen in the increasing number of the research institutes in the recent years. We learned a lot from the collaboration research across languages and we still hope that it will be a rigorous step for the future MT R&D in this region. Though the MT systems are still far from the extreme goal of the perfect translation, it can be observed that the MT systems are actually used to support information retrieval from the Internet.

pdf bib abs

A New Formalization of Probabilistic GLR Parsing
Kentaro Unui | Virach Sornlertlamvanich | Hozumi Tanaka | Takenobu Tokunaga
Proceedings of the Fifth International Workshop on Parsing Technologies

This paper presents a new formalization of probabilistic GLR language modeling for statistical parsing. Our model inherits its essential features from Briscoe and Carroll’s generalized probabilistic LR model, which obtains context-sensitivity by assigning a probability to each LR parsing action according to its left and right context. Briscoe and Carroll’s model, however, has a drawback in that it is not formalized in any probabilistically well-founded way, which may degrade its parsing performance. Our formulation overcomes this drawback with a few significant refinements, while maintaining all the advantages of Briscoe and Carroll’s modeling.

1996

pdf bib

Selective Sampling of Effective Example Sentence Sets for Word Sense Disambiguation
Atsushi Fujii | Kentaro Inui | Takenobu Tokunaga | Hozumi Tanaka
Fourth Workshop on Very Large Corpora

pdf bib

To what extent does case contribute to verb sense disambiguation?
Atsushi Fujii | Kentaro Inui | Takenobu Tokunaga | Hozumi Tanaka
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

pdf bib

The Automatic Extraction of Open Compounds from Text Corpora
Virach Sornlertlamvanich | Hozumi Tanaka
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

1994

pdf bib

Analysis of Japanese Compound Nouns using Collocational Information
Yosiyuki Kobayasi | Takenobu Tokunaga | Hozumi Tanaka
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

pdf bib

A Bayesian Approach for User Modeling in Dialogue Systems
Tomoyosi Akiba | Hozumi Tanaka
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

1992

pdf bib

A Chart-based Method of ID/LP Parsing with Generalized Discrimination Networks
Surapant Meknavin | Manabu Okumura | Hozumi Tanaka
COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics

1990

pdf bib

A New Parallel Algorithm for Generalized LR Parsing
Hiroaki Numazaki | Hozumi Tanaka
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

1989

pdf bib abs

Parallel Generalized LR Parsing based on Logic Programming
Hozumi Tanaka | Hiroaki Numazaki
Proceedings of the First International Workshop on Parsing Technologies

A generalized LR parsing algorithm, which has been developed by Tomita [Tomita 86], can treat a context free grammar. His algorithm makes use of breadth first strategy when a conflict occcurs in a LR parsing table. It is well known that the breadth first strategy is suitable for parallel processing. This paper presents an algorithm of a parallel parsing system (PLR) based on a generalized LR parsing. PLR is implemented in GHC [Ueda 85] that is a concurrent logic programming language developed by Japanese 5th generation computer project. The feature of PLR is as follows: Each entry of a LR parsing table is regarded as a process which handles shift and reduce operations. If a process discovers a conflict in a LR parsing table, it activates subprocesses which conduct shift and reduce operations. These subprocesses run in parallel and simulate breadth first strategy. There is no need to make some subprocesses synchronize during parsing. Stack information is sent to each subprocesses from their parent process. A simple experiment for parsing a sentence revealed the fact that PLR runs faster than PAX [Matsumoto 87][Matsumoto 89] that has been known as the best parallel parser.

pdf bib

Research and development of cooperation project on a machine translation system for Japanese and its neighboring countries
Hozumi Tanaka | Shun Ishizaki | Akira Uehara | Hiroshi Uchida
Proceedings of Machine Translation Summit II