Satoru Uchida


2023

pdf bib
Distractor Generation for Fill-in-the-Blank Exercises by Question Type
Nana Yoshimi | Tomoyuki Kajiwara | Satoru Uchida | Yuki Arase | Takashi Ninomiya
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

This study addresses the automatic generation of distractors for English fill-in-the-blank exercises in the entrance examinations for Japanese universities. While previous studies applied the same method to all questions, actual entrance examinations have multiple question types that reflect the purpose of the questions. Therefore, we define three types of questions (grammar, function word, and context) and propose a method to generate distractors according to the characteristics of each question type. Experimental results on 500 actual questions show the effectiveness of the proposed method for both automatic and manual evaluation.

2022

pdf bib
Controllable Text Simplification with Deep Reinforcement Learning
Daiki Yanamoto | Tomoki Ikawa | Tomoyuki Kajiwara | Takashi Ninomiya | Satoru Uchida | Yuki Arase
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We propose a method for controlling the difficulty of a sentence based on deep reinforcement learning. Although existing models are trained based on the word-level difficulty, the sentence-level difficulty has not been taken into account in the loss function. Our proposed method generates sentences of appropriate difficulty for the target audience through reinforcement learning using a reward calculated based on the difference between the difficulty of the output sentence and the target difficulty. Experimental results of English text simplification show that the proposed method achieves a higher performance than existing approaches. Compared to previous studies, the proposed method can generate sentences whose grade-levels are closer to those of human references estimated using a fine-tuned pre-trained model.

pdf bib
CEFR-Based Sentence Difficulty Annotation and Assessment
Yuki Arase | Satoru Uchida | Tomoyuki Kajiwara
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Controllable text simplification is a crucial assistive technique for language learning and teaching. One of the primary factors hindering its advancement is the lack of a corpus annotated with sentence difficulty levels based on language ability descriptions. To address this problem, we created the CEFR-based Sentence Profile (CEFR-SP) corpus, containing 17k English sentences annotated with the levels based on the Common European Framework of Reference for Languages assigned by English-education professionals. In addition, we propose a sentence-level assessment model to handle unbalanced level distribution because the most basic and highly proficient sentences are naturally scarce. In the experiments in this study, our method achieved a macro-F1 score of 84.5% in the level assessment, thus outperforming strong baselines employed in readability assessment.

2019

pdf bib
Contextualized context2vec
Kazuki Ashihara | Tomoyuki Kajiwara | Yuki Arase | Satoru Uchida
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Lexical substitution ranks substitution candidates from the viewpoint of paraphrasability for a target word in a given sentence. There are two major approaches for lexical substitution: (1) generating contextualized word embeddings by assigning multiple embeddings to one word and (2) generating context embeddings using the sentence. Herein we propose a method that combines these two approaches to contextualize word embeddings for lexical substitution. Experiments demonstrate that our method outperforms the current state-of-the-art method. We also create CEFR-LP, a new evaluation dataset for the lexical substitution task. It has a wider coverage of substitution candidates than previous datasets and assigns English proficiency levels to all target words and substitution candidates.

2018

pdf bib
CEFR-based Lexical Simplification Dataset
Satoru Uchida | Shohei Takada | Yuki Arase
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Contextualized Word Representations for Multi-Sense Embedding
Kazuki Ashihara | Tomoyuki Kajiwara | Yuki Arase | Satoru Uchida
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation