Satoshi Sato

Also published as: S. Sato


2024

pdf bib
Automatic Decomposition of Text Editing Examples into Primitive Edit Operations: Toward Analytic Evaluation of Editing Systems
Daichi Yamaguchi | Rei Miyata | Atsushi Fujita | Tomoyuki Kajiwara | Satoshi Sato
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents our work on a task of automatic decomposition of text editing examples into primitive edit operations. Toward a detailed analysis of the behavior of text editing systems, identification of fine-grained edit operations performed by the systems is essential. Given a pair of source and edited sentences, the goal of our task is to generate a non-redundant sequence of primitive edit operations, i.e., the semantically minimal edit operations preserving grammaticality, that iteratively converts the source sentence to the edited sentence. First, we formalize this task, explaining its significant features and specifying the constraints that primitive edit operations should satisfy. Then, we propose a method to automate this task, which consists of two steps: generation of an edit operation lattice and selection of an optimal path. To obtain a wide range of edit operation candidates in the first step, we combine a phrase aligner and a large language model. Experimental results show that our method perfectly decomposes 44% and 64% of editing examples in the text simplification and machine translation post-editing datasets, respectively. Detailed analyses also provide insights into the difficulties of this task, suggesting directions for improvement.

2023

pdf bib
Gauging the Gap Between Human and Machine Text Simplification Through Analytical Evaluation of Simplification Strategies and Errors
Daichi Yamaguchi | Rei Miyata | Sayuka Shimada | Satoshi Sato
Findings of the Association for Computational Linguistics: EACL 2023

This study presents an analytical evaluation of neural text simplification (TS) systems. Because recent TS models are trained in an end-to-end fashion, it is difficult to grasp their abilities to perform particular simplification operations. For the advancement of TS research and development, we should understand in detail what current TS systems can and cannot perform in comparison with human performance. To that end, we first developed an analytical evaluation framework consisting of fine-grained taxonomies of simplification strategies (at both the surface and content levels) and errors. Using this framework, we annotated TS instances produced by professional human editors and multiple neural TS systems and compared the results. Our analyses concretely and quantitatively revealed a wide gap between humans and systems, specifically indicating that systems tend to perform deletions and local substitutions while excessively omitting important information, and that the systems can hardly perform information addition operations. Based on our analyses, we also provide detailed directions to address these limitations.

2020

pdf bib
BERT-Based Simplification of Japanese Sentence-Ending Predicates in Descriptive Text
Taichi Kato | Rei Miyata | Satoshi Sato
Proceedings of the 13th International Conference on Natural Language Generation

Japanese sentence-ending predicates intricately combine content words and functional elements, such as aspect, modality, and honorifics; this can often hinder the understanding of language learners and children. Conventional lexical simplification methods, which replace difficult target words with simpler synonyms acquired from lexical resources in a word-by-word manner, are not always suitable for the simplification of such Japanese predicates. Given this situation, we propose a BERT-based simplification method, the core feature of which is the high ability to substitute the whole predicates with simple ones while maintaining their core meanings in the context by utilizing pre-trained masked language models. Experimental results showed that our proposed methods consistently outperformed the conventional thesaurus-based method by a wide margin. Furthermore, we investigated in detail the effectiveness of the average token embedding and dropout, and the remaining errors of our BERT-based methods.

2017

pdf bib
Coreference Resolution on Math Problem Text in Japanese
Takumi Ito | Takuya Matsuzaki | Satoshi Sato
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This paper describes a coreference resolution system for math problem text. Case frame dictionaries and a math taxonomy are utilized for supplying domain knowledge. The system deals with various anaphoric phenomena beyond well-studied entity coreferences.

2016

pdf bib
A Challenge to the Third Hoshi Shinichi Award
Satoshi Sato
Proceedings of the INLG 2016 Workshop on Computational Creativity in Natural Language Generation

2015

pdf bib
User Adaptive Restoration for Incorrectly-Segmented Utterances in Spoken Dialogue Systems
Kazunori Komatani | Naoki Hotta | Satoshi Sato | Mikio Nakano
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib
Text Readability and Word Distribution in Japanese
Satoshi Sato
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper reports the relation between text readability and word distribution in the Japanese language. There was no similar study in the past due to three major obstacles: (1) unclear definition of Japanese “word”, (2) no balanced corpus, and (3) no readability measure. Compilation of the Balanced Corpus of Contemporary Written Japanese (BCCWJ) and development of a readability predictor remove these three obstacles and enable this study. First, we have counted the frequency of each word in each text in the corpus. Then we have calculated the frequency rank of words both in the whole corpus and in each of three readability bands. Three major findings are: (1) the proportion of high-frequent words to tokens in Japanese is lower than that in English; (2) the type-coverage curve of words in the difficult-band draws an unexpected shape; (3) the size of the intersection between high-frequent words in the easy-band and these in the difficult-band is unexpectedly small.

2013

pdf bib
Generating More Specific Questions for Acquiring Attributes of Unknown Concepts from Users
Tsugumi Otsuka | Kazunori Komatani | Satoshi Sato | Mikio Nakano
Proceedings of the SIGDIAL 2013 Conference

2012

pdf bib
Dictionary Look-up with Katakana Variant Recognition
Satoshi Sato
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The Japanese language has rich variety and quantity of word variant. Since 1980s, it has been recognized that this richness becomes an obstacle against natural language processing. A complete solution, however, has not been presented yet. This paper proposes a method to recognize Katakana variants―a major type of word variant in Japanese―in the process of dictionary look-up. For a given set of variant generation rules, the method executes variant generation and entry retrieval simultaneously and efficiently. We have developed the seven-layered rule set (216 rules in total) according to the specification manual of UniDic-2.1.0 and other sources. An experiment shows that the spelling-variant generator with 102 rules in the first five layers is almost perfect. Another experiment shows that the form-variant generator with all 216 rules is powerful and 77.7% of multiple spellings of Katakana loanwords are unnecessary (i.e., can be removed). This result means that the proposed method can drastically reduce the number of variants that we have to register into a dictionary in advance.

2010

pdf bib
Standardizing Complex Functional Expressions in Japanese Predicates: Applying Theoretically-Based Paraphrasing Rules
Tomoko Izumi | Kenji Imamura | Genichiro Kikui | Satoshi Sato
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

pdf bib
A Person-Name Filter for Automatic Compilation of Bilingual Person-Name Lexicons
Satoshi Sato | Sayoko Kaide
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper proposes a simple and fast person-name filter, which plays an important role in automatic compilation of a large bilingual person-name lexicon. This filter is based on pn_score, which is the sum of two component scores, the score of the first name and that of the last name. Each score is calculated from two term sets: one is a dense set in which most of the members are person names; another is a baseline set that contains less person names. The pn_score takes one of five values, {+2, +1, 0, -1, -2}, which correspond to strong positive, positive, undecidable, negative, and strong negative, respectively. This pn_score can be easily extended to bilingual pn_score that takes one of nine values, by summing scores of two languages. Experimental results show that our method works well for monolingual person names in English and Japanese; the F-score of each language is 0.929 and 0.939, respectively. The performance of the bilingual person-name filter is better; the F-score is 0.955.

2008

pdf bib
Computing Paraphrasability of Syntactic Variants Using Web Snippets
Atsushi Fujita | Satoshi Sato
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Automatic Paraphrasing of Japanese Functional Expressions Using a Hierarchically Organized Dictionary
Suguru Matsuyoshi | Satoshi Sato
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus
Satoshi Sato | Suguru Matsuyoshi | Yohsuke Kondoh
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes a method of readability measurement of Japanese texts based on a newly compiled textbook corpus. The textbook corpus consists of 1,478 sample passages extracted from 127 textbooks of elementary school, junior high school, high school, and university; it is divided into thirteen grade levels and the total size is about a million characters. For a given text passage, the readability measurement method determines the grade level to which the passage is the most similar by using character-unigram models, which are constructed from the textbook corpus. Because this method does not require sentence-boundary analysis and word-boundary analysis, it is applicable to texts that include incomplete sentences and non-regular text fragments. The performance of this method, which is measured by the correlation coefficient, is considerably high (R > 0.9); in case that the length of a text passage is limited in 25 characters, the correlation coefficient is still high (R = 0.83).

pdf bib
A Probabilistic Model for Measuring Grammaticality and Similarity of Automatically Generated Paraphrases of Predicate Phrases
Atsushi Fujita | Satoshi Sato
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Learning Dependency Relations of Japanese Compound Functional Expressions
Takehito Utsuro | Takao Shime | Masatoshi Tsuchiya | Suguru Matsuyoshi | Satoshi Sato
Proceedings of the Workshop on A Broader Perspective on Multiword Expressions

pdf bib
A Compositional Approach toward Dynamic Phrasal Thesaurus
Atsushi Fujita | Shuhei Kato | Naoki Kato | Satoshi Sato
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

2006

pdf bib
Japanese Idiom Recognition: Drawing a Line between Literal and Idiomatic Meanings
Chikara Hashimoto | Satoshi Sato | Takehito Utsuro
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Compiling French-Japanese Terminologies from the Web
Xavier Robitaille | Yasuhiro Sasaki | Masatsugu Tonoike | Satoshi Sato | Takehito Utsuro
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Adjective-to-Verb Paraphrasing in Japanese Based on Lexical Constraints of Verbs
Atsushi Fujita | Naruaki Masuno | Satoshi Sato | Takehito Utsuro
Proceedings of the Fourth International Natural Language Generation Conference

pdf bib
A comparative study on compositional translation estimation using a domain/topic-specific corpus collected from the Web
Masatsugu Tonoike | Mitsuhiro Kida | Toshihiro Takagi | Yasuhiro Sasaki | Takehito Utsuro | S. Sato
Proceedings of the 2nd International Workshop on Web as Corpus

pdf bib
Chunking Japanese Compound Functional Expressions by Machine Learning
Masatoshi Tsuchiya | Takao Shime | Toshihiro Takagi | Takehito Utsuro | Kiyotaka Uchimoto | Suguru Matsuyoshi | Satoshi Sato | Seiichi Nakagawa
Proceedings of the Workshop on Multi-word-expressions in a multilingual context

2005

pdf bib
Effect of Domain-Specific Corpus in Compositional Translation Estimation for Technical Terms
Masatsugu Tonoike | Mitsuhiro Kida | Toshihiro Takagi | Yasuhiro Sasaki | Takehito Utsuro | Satoshi Sato
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

2004

pdf bib
Answer validation by keyword association
Masatsugu Tonoike | Takehito Utsuro | Satoshi Sato
Proceedings of the 3rd workshop on RObust Methods in Analysis of Natural Language Data (ROMAND 2004)

pdf bib
Integrating Cross-Lingually Relevant News Articles and Monolingual Web Documents in Bilingual Lexicon Acquisition
Takehito Utsuro | Kohei Hino | Mitsuhiro Kida | Seiichi Nakagawa | Satoshi Sato
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Automatic Collection of Related Terms from the Web
Satoshi Sato | Yasuhiro Sasaki
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Automatic Detection of Grammar Elements that Decrease Readability
Masatoshi Tsuchiya | Satoshi Sato
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf bib
Verb Paraphrase based on Case Frame Alignment
Nobuhiro Kaji | Daisuke Kawahara | Sadao Kurohashi | Satoshi Sato
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

pdf bib
Finding translation correspondences from parallel parsed corpus for example-based translation
Eiji Aramaki | Sadao Kurohashi | Satoshi Sato | Hideo Watanabe
Proceedings of Machine Translation Summit VIII

This paper describes a system for finding phrasal translation correspondences from parallel parsed corpus that are collections paired English and Japanese sentences. First, the system finds phrasal correspondences by Japanese-English translation dictionary consultation. Then, the system finds correspondences in remaining phrases by using sentences dependency structures and the balance of all correspondences. The method is based on an assumption that in parallel corpus most fragments in a source sentence have corresponding fragments in a target sentence.

1993

pdf bib
Example-Based Translation of Technical Terms
Satoshi Sato
Proceedings of the Fifth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

1992

pdf bib
A Method of Automatic Hypertext Construction from an Encyclopedic Dictionary of a Specific Field
Sadao Kurohashi | Makoto Nagao | Satoshi Sato | Masahiko Murakami
Third Conference on Applied Natural Language Processing

pdf bib
CTM: An Example-Based Translation Aid System
Satoshi Sato
COLING 1992 Volume 4: The 14th International Conference on Computational Linguistics

1990

pdf bib
Toward Memory-based Translation
Satoshi Sato | Makoto Nagao
COLING 1990 Volume 3: Papers presented to the 13th International Conference on Computational Linguistics