Takaaki Tanaka


2018

pdf bib
Universal Dependencies Version 2 for Japanese
Masayuki Asahara | Hiroshi Kanayama | Takaaki Tanaka | Yusuke Miyao | Sumire Uematsu | Shinsuke Mori | Yuji Matsumoto | Mai Omura | Yugo Murawaki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Hierarchical Word Structure-based Parsing: A Feasibility Study on UD-style Dependency Parsing in Japanese
Takaaki Tanaka | Katsuhiko Hayashi | Masaaki Nagata
Proceedings of the 15th International Conference on Parsing Technologies

In applying word-based dependency parsing such as Universal Dependencies (UD) to Japanese, the uncertainty of word segmentation emerges for defining a word unit of the dependencies. We introduce the following hierarchical word structures to dependency parsing in Japanese: morphological units (a short unit word, SUW) and syntactic units (a long unit word, LUW). An SUW can be used to segment a sentence consistently, while it is too short to represent syntactic construction. An LUW is a unit including functional multiwords and LUW-based analysis facilitates the capturing of syntactic structure and makes parsing results more precise than SUW-based analysis. This paper describes the results of a feasibility study on the ability and the effectiveness of parsing methods based on hierarchical word structure (LUW chunking+parsing) in comparison to single layer word structure (SUW parsing). We also show joint analysis of LUW-chunking and dependency parsing improves the performance of identifying predicate-argument structures, while there is not much difference between overall results of them. not much difference between overall results of them.

2016

pdf bib
Universal Dependencies for Japanese
Takaaki Tanaka | Yusuke Miyao | Masayuki Asahara | Sumire Uematsu | Hiroshi Kanayama | Shinsuke Mori | Yuji Matsumoto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present an attempt to port the international syntactic annotation scheme, Universal Dependencies, to the Japanese language in this paper. Since the Japanese syntactic structure is usually annotated on the basis of unique chunk-based dependencies, we first introduce word-based dependencies by using a word unit called the Short Unit Word, which usually corresponds to an entry in the lexicon UniDic. Porting is done by mapping the part-of-speech tagset in UniDic to the universal part-of-speech tagset, and converting a constituent-based treebank to a typed dependency tree. The conversion is not straightforward, and we discuss the problems that arose in the conversion and the current solutions. A treebank consisting of 10,000 sentences was built by converting the existent resources and currently released to the public.

2015

pdf bib
Word-based Japanese typed dependency parsing with grammatical function analysis
Takaaki Tanaka | Masaaki Nagata
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2013

pdf bib
Constructing a Practical Constituent Parser from a Japanese Treebank with Function Labels
Takaaki Tanaka | Masaaki Nagata
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

2008

pdf bib
MRD-based Word Sense Disambiguation: Further Extending Lesk
Timothy Baldwin | Su Nam Kim | Francis Bond | Sanae Fujita | David Martinez | Takaaki Tanaka
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

2007

pdf bib
Exploiting Semantic Information for HPSG Parse Selection
Sanae Fujita | Francis Bond | Stephan Oepen | Takaaki Tanaka
ACL 2007 Workshop on Deep Linguistic Processing

pdf bib
Word Sense Disambiguation Incorporating Lexical and Structural Semantic Information
Takaaki Tanaka | Francis Bond | Timothy Baldwin | Sanae Fujita | Chikara Hashimoto
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
An Implemented Description of Japanese: The Lexeed Dictionary and the Hinoki Treebank
Sanae Fujita | Takaaki Tanaka | Francis Bond | Hiromi Nakaiwa
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

pdf bib
Multilingual Ontology Acquisition from Multiple MRDs
Eric Nichols | Francis Bond | Takaaki Tanaka | Sanae Fujita | Dan Flickinger
Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge

pdf bib
The Hinoki Sensebank — A Large-Scale Word Sense Tagged Corpus of Japanese —
Takaaki Tanaka | Francis Bond | Sanae Fujita
Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006

2005

pdf bib
Integration of a Lexical Type Database with a Linguistically Interpreted Corpus
Chikara Hashimoto | Francis Bond | Takaaki Tanaka | Melanie Siegel
Proceedings of the Sixth International Workshop on Linguistically Interpreted Corpora (LINC-2005)

pdf bib
Extracting Representative Arguments from Dictionaries for Resolving Zero Pronouns
Shigeko Nariyama | Eric Nichols | Francis Bond | Takaaki Tanaka | Hiromi Nakaiwa
Proceedings of Machine Translation Summit X: Papers

We propose a method to alleviate the problem of referential granularity for Japanese zero pronoun resolution. We use dictionary definition sentences to extract ‘representative’ arguments of predicative definition words; e.g. ‘arrest’ is likely to take police as the subject and criminal as its object. These representative arguments are far more informative than ‘person’ that is provided by other valency dictionaries. They are auto-extracted using both Shallow parsing and Deep parsing for greater quality and quantity. Initial results are highly promising, obtaining more specific information about selectional preferences. An architecture of zero pronoun resolution using these representative arguments is described.

pdf bib
High Precision Treebanking—Blazing Useful Trees Using POS Information
Takaaki Tanaka | Francis Bond | Stephan Oepen | Sanae Fujita
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Translation by Machine of Complex Nominals: Getting it Right
Timothy Baldwin | Takaaki Tanaka
Proceedings of the Workshop on Multiword Expressions: Integrating Processing

pdf bib
The Hinoki Treebank. Working Toward Text Understanding
Francis Bond | Sanae Fujita | Chikara Hashimoto | Kaname Kasahara | Shigeko Nariyama | Eric Nichols | Akira Ohtani | Takaaki Tanaka | Shigeaki Amano
Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora

pdf bib
Acquiring an Ontology for a Fundamental Vocabulary
Francis Bond | Eric Nichols | Sanae Fujita | Takaaki Tanaka
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Noun-Noun Compound Machine Translation A Feasibility Study on Shallow Processing
Takaaki Tanaka | Timothy Baldwin
Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment

pdf bib
An Empirical Model of Multiword Expression Decomposability
Timothy Baldwin | Colin Bannard | Takaaki Tanaka | Dominic Widdows
Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment

pdf bib
Translation selection for Japanese-English noun-noun compounds
Takaaki Tanaka | Timothy Baldwin
Proceedings of Machine Translation Summit IX: Papers

We present a method for compositionally translating Japanese NN compounds into English, using a word-level transfer dictionary and target language monolingual corpus. The method interpolates over fully-specified and partial translation data, based on corpus evidence. In evaluation, we demonstrate that interpolation over the two data types is superior to using either one, and show that our method performs at an F-score of 0.68 over translation-aligned inputs and 0.66 over a random sample of 500 NN compounds.

2002

pdf bib
Measuring the Similarity between Compound Nouns in Different Languages Using Non-Parallel Corpora
Takaaki Tanaka
COLING 2002: The 19th International Conference on Computational Linguistics

1999

pdf bib
Extraction of translation equivalents from non-parallel corpora
Takaaki Tanaka | Yoshihiro Matsuo
Proceedings of the 8th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages