JADES: New Text Simplification Dataset in Japanese Targeted at Non-Native Speakers
Akio Hayakawa | Tomoyuki Kajiwara | Hiroki Ouchi | Taro Watanabe
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
The user-dependency of Text Simplification makes its evaluation obscure. A targeted evaluation dataset clarifies the purpose of simplification, though its specification is hard to define. We built JADES (JApanese Dataset for the Evaluation of Simplification), a text simplification dataset targeted at non-native Japanese speakers, according to public vocabulary and grammar profiles. JADES comprises 3,907 complex-simple sentence pairs annotated by an expert. Analysis of JADES shows that wide and multiple rewriting operations were applied through simplification. Furthermore, we analyzed outputs on JADES from several benchmark systems and automatic and manual scores of them. Results of these analyses highlight differences between English and Japanese in operations and evaluations.