Koki Hatagaki


2022

pdf bib
Parallel Corpus Filtering for Japanese Text Simplification
Koki Hatagaki | Tomoyuki Kajiwara | Takashi Ninomiya
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

We propose a method of parallel corpus filtering for Japanese text simplification. The parallel corpus for this task contains some redundant wording. In this study, we first identify the type and size of noisy sentence pairs in the Japanese text simplification corpus. We then propose a method of parallel corpus filtering to remove each type of noisy sentence pair. Experimental results show that filtering the training parallel corpus with the proposed method improves simplification performance.