Satoru Uchida


pdf bib
Controllable Text Simplification with Deep Reinforcement Learning
Daiki Yanamoto | Tomoki Ikawa | Tomoyuki Kajiwara | Takashi Ninomiya | Satoru Uchida | Yuki Arase
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We propose a method for controlling the difficulty of a sentence based on deep reinforcement learning. Although existing models are trained based on the word-level difficulty, the sentence-level difficulty has not been taken into account in the loss function. Our proposed method generates sentences of appropriate difficulty for the target audience through reinforcement learning using a reward calculated based on the difference between the difficulty of the output sentence and the target difficulty. Experimental results of English text simplification show that the proposed method achieves a higher performance than existing approaches. Compared to previous studies, the proposed method can generate sentences whose grade-levels are closer to those of human references estimated using a fine-tuned pre-trained model.

pdf bib
CEFR-Based Sentence Difficulty Annotation and Assessment
Yuki Arase | Satoru Uchida | Tomoyuki Kajiwara
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Controllable text simplification is a crucial assistive technique for language learning and teaching. One of the primary factors hindering its advancement is the lack of a corpus annotated with sentence difficulty levels based on language ability descriptions. To address this problem, we created the CEFR-based Sentence Profile (CEFR-SP) corpus, containing 17k English sentences annotated with the levels based on the Common European Framework of Reference for Languages assigned by English-education professionals. In addition, we propose a sentence-level assessment model to handle unbalanced level distribution because the most basic and highly proficient sentences are naturally scarce. In the experiments in this study, our method achieved a macro-F1 score of 84.5% in the level assessment, thus outperforming strong baselines employed in readability assessment.


pdf bib
Contextualized context2vec
Kazuki Ashihara | Tomoyuki Kajiwara | Yuki Arase | Satoru Uchida
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Lexical substitution ranks substitution candidates from the viewpoint of paraphrasability for a target word in a given sentence. There are two major approaches for lexical substitution: (1) generating contextualized word embeddings by assigning multiple embeddings to one word and (2) generating context embeddings using the sentence. Herein we propose a method that combines these two approaches to contextualize word embeddings for lexical substitution. Experiments demonstrate that our method outperforms the current state-of-the-art method. We also create CEFR-LP, a new evaluation dataset for the lexical substitution task. It has a wider coverage of substitution candidates than previous datasets and assigns English proficiency levels to all target words and substitution candidates.


pdf bib
Contextualized Word Representations for Multi-Sense Embedding
Kazuki Ashihara | Tomoyuki Kajiwara | Yuki Arase | Satoru Uchida
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf bib
CEFR-based Lexical Simplification Dataset
Satoru Uchida | Shohei Takada | Yuki Arase
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)