Suguru Matsuyoshi

2018

Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus
Suguru Matsuyoshi | Hirotaka Kameko | Yugo Murawaki | Shinsuke Mori
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2014

pdf bib abs

Annotating the Focus of Negation in Japanese Text
Suguru Matsuyoshi | Ryo Otsuki | Fumiyo Fukumoto
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper proposes an annotation scheme for the focus of negation in Japanese text. Negation has its scope and the focus within the scope. The scope of negation is the part of the sentence that is negated; the focus is the part of the scope that is most prominently or explicitly negated. In natural language processing, correct interpretation of negated statements requires precise detection of the focus of negation in the statements. As a foundation for developing a negation focus detector for Japanese, we have annotated textdata of “Rakuten Travel: User review data” and the newspaper subcorpus of the “Balanced Corpus of Contemporary Written Japanese” with labels proposed in our annotation scheme. We report 1,327 negation cues and the foci in the corpora, and present classification of these foci based on syntactic types and semantic types. We also propose a system for detecting the focus of negation in Japanese using 16 heuristic rules and report the performance of the system.

pdf bib

The Effect of Temporal-based Term Selection for Text Classification
Fumiyo Fukumoto | Shougo Ushiyama | Yoshimi Suzuki | Suguru Matsuyoshi
Proceedings of the Australasian Language Technology Association Workshop 2014

2013

pdf bib

Text Classification from Positive and Unlabeled Data using Misclassified Data Correction
Fumiyo Fukumoto | Yoshimi Suzuki | Suguru Matsuyoshi
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib abs

Detecting Japanese Compound Functional Expressions using Canonical/Derivational Relation
Takafumi Suzuki | Yusuke Abe | Itsuki Toyota | Takehito Utsuro | Suguru Matsuyoshi | Masatoshi Tsuchiya
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The Japanese language has various types of functional expressions. In order to organize Japanese functional expressions with various surface forms, a lexicon of Japanese functional expressions with hierarchical organization was compiled. This paper proposes how to design the framework of identifying more than 16,000 functional expressions in Japanese texts by utilizing hierarchical organization of the lexicon. In our framework, more than 16,000 functional expressions are roughly divided into canonical / derived functional expressions. Each derived functional expression is intended to be identified by referring to the most similar occurrence of its canonical expression. In our framework, contextual occurrence information of much fewer canonical expressions are expanded into the whole forms of derived expressions, to be utilized when identifying those derived expressions. We also empirically show that the proposed method can correctly identify more than 80% of the functional / content usages only with less than 38,000 training instances of manually identified canonical expressions.

pdf bib

Exploiting Discourse Relations between Sentences for Text Clustering
Nik Adilah Hanin Binti Zahri | Fumiyo Fukumoto | Suguru Matsuyoshi
Proceedings of the Workshop on Advances in Discourse Analysis and its Computational Aspects

2011

pdf bib

pdf bib

2010

pdf bib abs

Annotating Event Mentions in Text with Modality, Focus, and Source Information
Suguru Matsuyoshi | Megumi Eguchi | Chitose Sao | Koji Murakami | Kentaro Inui | Yuji Matsumoto
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Many natural language processing tasks, including information extraction, question answering and recognizing textual entailment, require analysis of the polarity, focus of polarity, tense, aspect, mood and source of the event mentions in a text in addition to its predicate-argument structure analysis. We refer to modality, polarity and other associated information as extended modality. In this paper, we propose a new annotation scheme for representing the extended modality of event mentions in a sentence. Our extended modality consists of the following seven components: Source, Time, Conditional, Primary modality type, Actuality, Evaluation and Focus. We reviewed the literature about extended modality in Linguistics and Natural Language Processing (NLP) and defined appropriate labels of each component. In the proposed annotation scheme, information of extended modality of an event mention is summarized at the core predicate of the event mention for immediate use in NLP applications. We also report on the current progress of our manual annotation of a Japanese corpus of about 50,000 event mentions, showing a reasonably high ratio of inter-annotator agreement.

pdf bib abs

Utilizing Semantic Equivalence Classes of Japanese Functional Expressions in Translation Rule Acquisition from Parallel Patent Sentences
Taiji Nagasaka | Ran Shimanouchi | Akiko Sakamoto | Takafumi Suzuki | Yohei Morishita | Takehito Utsuro | Suguru Matsuyoshi
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In the ``Sandglass'' MT architecture, we identify the class of monosemous Japanese functional expressions and utilize it in the task of translating Japanese functional expressions into English. We employ the semantic equivalence classes of a recently compiled large scale hierarchical lexicon of Japanese functional expressions. We then study whether functional expressions within a class can be translated into a single canonical English expression. Based on the results of identifying monosemous semantic equivalence classes, this paper studies how to extract rules for translating functional expressions in Japanese patent documents into English. In this study, we use about 1.8M Japanese-English parallel sentences automatically extracted from Japanese-English patent families, which are distributed through the Patent Translation Task at the NTCIR-7 Workshop. Then, as a toolkit of a phrase-based SMT (Statistical Machine Translation) model, Moses is applied and Japanese-English translation pairs are obtained in the form of a phrase translation table. Finally, we extract translation pairs of Japanese functional expressions from the phrase translation table. Through this study, we found that most of the semantic equivalence classes judged as monosemous based on manual translation into English have only one translation rules even in the patent domain.

pdf bib

2009

pdf bib

pdf bib

Identifying and Utilizing the Class of Monosemous Japanese Functional Expressions in Machine Translation
Akiko Sakamoto | Taiji Nagasaka | Takehito Utsuro | Suguru Matsuyoshi
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

2008

pdf bib

Automatic Paraphrasing of Japanese Functional Expressions Using a Hierarchically Organized Dictionary
Suguru Matsuyoshi | Satoshi Sato
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib abs

Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus
Satoshi Sato | Suguru Matsuyoshi | Yohsuke Kondoh
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes a method of readability measurement of Japanese texts based on a newly compiled textbook corpus. The textbook corpus consists of 1,478 sample passages extracted from 127 textbooks of elementary school, junior high school, high school, and university; it is divided into thirteen grade levels and the total size is about a million characters. For a given text passage, the readability measurement method determines the grade level to which the passage is the most similar by using character-unigram models, which are constructed from the textbook corpus. Because this method does not require sentence-boundary analysis and word-boundary analysis, it is applicable to texts that include incomplete sentences and non-regular text fragments. The performance of this method, which is measured by the correlation coefficient, is considerably high (R > 0.9); in case that the length of a text passage is limited in 25 characters, the correlation coefficient is still high (R = 0.83).