Hozumi Tanaka


2005

pdf bib
eBonsai: An Integrated Environment for Annotating Treebanks
Hiroshi Ichikawa | Masaki Noguchi | Taiichi Hashimoto | Takenobu Tokunaga | Hozumi Tanaka
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

pdf bib
Evaluation of a Japanese CFG Derived from a Syntactically Annotated Corpus with Respect to Dependency Measures
Tomoya Noro | Chimato Koike | Taiichi Hashimoto | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of the Fifth Workshop on Asian Language Resources (ALR-05) and First Symposium on Asian Language Resources Network (ALRN)

2004

pdf bib
Evaluating the FOKS Error Model
Slaven Bilac | Timothy Baldwin | Hozumi Tanaka
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Retrieving Annotated Corpora for Corpus Annotation
Kyôsuke Yoshida | Taiichi Hashimoto | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

This paper introduces a tool \Bonsai which supports human in annotating corpora with morphosyntactic information, and in retrieving syntactic structures stored in the database. Integrating annotation and retrieval enables users to annotate a new instance while looking back at the already annotated sentences which share the similar morphosyntactic structure. We focus on the retrieval part of the system, and describe a method to decompose a large input query into smaller ones in order to gain retrieval efficiency. The proposed method is evaluated with the Penn Treebank corpus, showing significant improvements.

pdf bib
A hybrid back-transliteration system for Japanese
Slaven Bilac | Hozumi Tanaka
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Feature Selection in Categorizing Procedural Expressions
Mineki Takechi | Takenobu Tokunaga | Yuji Matsumoto | Hozumi Tanaka
Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages

pdf bib
Paraphrasing Japanese Noun Phrases using Character-based Indexing
Takenobu Tokunaga | Hozumi Tanaka | Kenji Kimura
Proceedings of the Second International Workshop on Paraphrasing

pdf bib
The Interactive Navigation to the Stored Q&A data using Simple Questions
Kunio Matsui | Hozumi Tanaka
Proceedings of the 2003 EACL Workshop on Dialogue Systems: interaction, adaptation and styes of management

2002

pdf bib
Constructing a lexicon of action
Takenobu Tokunaga | Manabu Okumura | Suguru Saitô | Hozumi Tanaka
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Towards a Thesaurus of Predicates
Satoshi Shirai | Kazuhide Yamamoto | Francis Bond | Hozumi Tanaka
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Enhanced Japanese Electronic Dictionary Look-up
Timothy Baldwin | Slaven Bilac | Ryo Okumura | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Processing Japanese Self-correction in Speech Dialog Systems
Kotaro Funakoshi | Takenobu Tokunaga | Hozumi Tanaka
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Bringing the Dictionary to the User: The FOKS System
Slaven Bilac | Timothy Baldwin | Hozumi Tanaka
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Decision lists for determining adjective dependency in Japanese
Taiichi Hashimoto | Kosuke Nishidate | Kiyoaki Shirai | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of Machine Translation Summit VIII

In Japanese constructions of the form [N1 no Adj N2], the adjective Adj modifies either N1 or N2. Determing the semantic dependencies of adjective in such phrase is an important task for machine translation. This paper describes a method for determining the adjective dependency in such constructions using decision lists, and inducing decision lists from training contexts with correct semantic dependencies and without. Based on evaluation, our method is able to determine adjective dependency with an precision of about 94%. We further analyze rules in the induced decision lists and examine effective features to determine the semantic dependencies of adjectives.

pdf bib
The Japanese Translation Task: Lexical and Structural Perspectives
Timothy Baldwin | Atsushi Okazaki | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

2000

pdf bib
Semi-automatic Construction of a Tree-annotated Corpus Using an Iterative Learning Statistical Language Model
Kiyoaki Shirai | Hozumi Tanaka | Takenobu Tokunaga
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
The Effects of Word Order and Segmentation on Translation Retrieval Performance
Timothy Baldwin | Hozumi Tanaka
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
Verb Alternations and Japanese : How, What and Where
Timothy Baldwin | Hozumi Tanaka
Proceedings of the 14th Pacific Asia Conference on Language, Information and Computation

1999

pdf bib
The applications of unsupervised learning to Japanese grapheme-phoneme alignment
Timothy Baldwin | Hozumi Tanaka
Unsupervised Learning in Natural Language Processing

pdf bib
Complementing WordNet with Roget’s and Corpus-based Thesauri for Information Retrieval
Rila Mandala | Takenobu Tokunaga | Hozumi Tanaka
Ninth Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
What should we do next for MT system development?
Hozumi Tanaka
Proceedings of Machine Translation Summit VII

pdf bib
Sharing syntactic structures
Masahiro Ueki | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of Machine Translation Summit VII

Bracketed corpora are a very useful resource for natural language processing, but hard to build efficiently, leading to quantitative insufficiency for practical use. Disparities in morphological information, such as word segmentation and part-of-speech tag sets, are also troublesome. An application specific to a particular corpus often cannot be applied to another corpus. In this paper, we sketch out a method to build a corpus that has a fixed syntactic structure but varying morphological annotation based on the different tag set schemes utilized. Our system uses a two layered grammar, one layer of which is made up of replaceable tag-set-dependent rules while the other has no such tag set dependency. The input sentences of our system are bracketed corresponding to structural information of corpus. The parser can work using any tag set and grammar, and using the same input bracketing, we obtain corpus that shares partial syntactic structure.

pdf bib
Argument status in Japanese verb sense disambiguation
Timothy Baldwin | Hozumi Tanaka
Proceedings of the 8th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

1998

pdf bib
The Use of WordNet in Information Retrieval
Mandala Rila | Takenobu Tokunaga | Hozumi Tanaka
Usage of WordNet in Natural Language Processing Systems

pdf bib
A Method of Incorporating Bigram Constraints into an LR Table and Its Effectiveness in Natural Language Processing
Hiroki Imai | Hozumi Tanaka
New Methods in Language Processing and Computational Natural Language Learning

pdf bib
An Empirical Evaluation on Statistical Parsing of Japanese Sentences Using Lexical Association Statistics
Kiyoaki Shirai | Kentaro Inui | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of the Third Conference on Empirical Methods for Natural Language Processing

pdf bib
Selective Sampling for Example-based Word Sense Disambiguation
Atsushi Fujii | Kentaro Inui | Takenobu Tokunaga | Hozumi Tanaka
Computational Linguistics, Volume 24, Number 4, December 1998

1997

pdf bib
A New Formalization of Probabilistic GLR Parsing
Kentaro Unui | Virach Sornlertlamvanich | Hozumi Tanaka | Takenobu Tokunaga
Proceedings of the Fifth International Workshop on Parsing Technologies

This paper presents a new formalization of probabilistic GLR language modeling for statistical parsing. Our model inherits its essential features from Briscoe and Carroll’s generalized probabilistic LR model, which obtains context-sensitivity by assigning a probability to each LR parsing action according to its left and right context. Briscoe and Carroll’s model, however, has a drawback in that it is not formalized in any probabilistically well-founded way, which may degrade its parsing performance. Our formulation overcomes this drawback with a few significant refinements, while maintaining all the advantages of Briscoe and Carroll’s modeling.

pdf bib
Incorporating Bigram Constraints into an LR Table
Hiroki Imai | Hui Li | Hozumi Tanaka
Proceedings of the 10th Research on Computational Linguistics International Conference

pdf bib
MT R&D in Asia
Hozumi Tanaka
Proceedings of Machine Translation Summit VI: Papers

There is a big shift in MT R&D in this region after many large-scale projects conducted in the past ten years. Multi-lingual Machine Translation (MMT) project is one of the significant R&D projects that increased a great number of NLP related researchers and research activities which can be seen in the increasing number of the research institutes in the recent years. We learned a lot from the collaboration research across languages and we still hope that it will be a rigorous step for the future MT R&D in this region. Though the MT systems are still far from the extreme goal of the perfect translation, it can be observed that the MT systems are actually used to support information retrieval from the Internet.

pdf bib
Extending a thesaurus by classifying words
Takenobu Tokunaga | Atsushi Fujii | Naoyuki Sakurai | Hozumi Tanaka
Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications

1996

pdf bib
Selective Sampling of Effective Example Sentence Sets for Word Sense Disambiguation
Atsushi Fujii | Kentaro Inui | Takenobu Tokunaga | Hozumi Tanaka
Fourth Workshop on Very Large Corpora

pdf bib
To what extent does case contribute to verb sense disambiguation?
Atsushi Fujii | Kentaro Inui | Takenobu Tokunaga | Hozumi Tanaka
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

pdf bib
The Automatic Extraction of Open Compounds from Text Corpora
Virach Sornlertlamvanich | Hozumi Tanaka
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

1994

pdf bib
Analysis of Japanese Compound Nouns using Collocational Information
Yosiyuki Kobayasi | Takenobu Tokunaga | Hozumi Tanaka
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

pdf bib
A Bayesian Approach for User Modeling in Dialogue Systems
Tomoyosi Akiba | Hozumi Tanaka
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

1992

pdf bib
A Chart-based Method of ID/LP Parsing with Generalized Discrimination Networks
Surapant Meknavin | Manabu Okumura | Hozumi Tanaka
COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics

1990

pdf bib
A New Parallel Algorithm for Generalized LR Parsing
Hiroaki Numazaki | Hozumi Tanaka
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

1989

pdf bib
Research and development of cooperation project on a machine translation system for Japanese and its neighboring countries
Hozumi Tanaka | Shun Ishizaki | Akira Uehara | Hiroshi Uchida
Proceedings of Machine Translation Summit II

pdf bib
Parallel Generalized LR Parsing based on Logic Programming
Hozumi Tanaka | Hiroaki Numazaki
Proceedings of the First International Workshop on Parsing Technologies

A generalized LR parsing algorithm, which has been developed by Tomita [Tomita 86], can treat a context free grammar. His algorithm makes use of breadth first strategy when a conflict occcurs in a LR parsing table. It is well known that the breadth first strategy is suitable for parallel processing. This paper presents an algorithm of a parallel parsing system (PLR) based on a generalized LR parsing. PLR is implemented in GHC [Ueda 85] that is a concurrent logic programming language developed by Japanese 5th generation computer project. The feature of PLR is as follows: Each entry of a LR parsing table is regarded as a process which handles shift and reduce operations. If a process discovers a conflict in a LR parsing table, it activates subprocesses which conduct shift and reduce operations. These subprocesses run in parallel and simulate breadth first strategy. There is no need to make some subprocesses synchronize during parsing. Stack information is sent to each subprocesses from their parent process. A simple experiment for parsing a sentence revealed the fact that PLR runs faster than PAX [Matsumoto 87][Matsumoto 89] that has been known as the best parallel parser.

1988

pdf bib
LangLAB: A Natural Language Analysis System
Takenobu Tokunaga | Makoto Iwayama | Hozumi Tanaka | Tadashi Kamiwaki
Coling Budapest 1988 Volume 2: International Conference on Computational Linguistics

1986

pdf bib
DCKR – Knowledge Representation in Prolog and Its Application to Natural Language Processing
Hozumi Tanaka
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics

1980

pdf bib
Unit-to-Unit Interaction as a Basis for Semantic Interpretation of Japanese Sentences
Hozumi Tanaka
COLING 1980 Volume 1: The 8th International Conference on Computational Linguistics