JADES: New Text Simplification Dataset in Japanese Targeted at Non-Native Speakers

Akio Hayakawa, Tomoyuki Kajiwara, Hiroki Ouchi, Taro Watanabe


Abstract
The user-dependency of Text Simplification makes its evaluation obscure. A targeted evaluation dataset clarifies the purpose of simplification, though its specification is hard to define. We built JADES (JApanese Dataset for the Evaluation of Simplification), a text simplification dataset targeted at non-native Japanese speakers, according to public vocabulary and grammar profiles. JADES comprises 3,907 complex-simple sentence pairs annotated by an expert. Analysis of JADES shows that wide and multiple rewriting operations were applied through simplification. Furthermore, we analyzed outputs on JADES from several benchmark systems and automatic and manual scores of them. Results of these analyses highlight differences between English and Japanese in operations and evaluations.
Anthology ID:
2022.tsar-1.17
Volume:
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Virtual)
Editors:
Sanja Štajner, Horacio Saggion, Daniel Ferrés, Matthew Shardlow, Kim Cheng Sheang, Kai North, Marcos Zampieri, Wei Xu
Venue:
TSAR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
179–187
Language:
URL:
https://aclanthology.org/2022.tsar-1.17
DOI:
10.18653/v1/2022.tsar-1.17
Bibkey:
Cite (ACL):
Akio Hayakawa, Tomoyuki Kajiwara, Hiroki Ouchi, and Taro Watanabe. 2022. JADES: New Text Simplification Dataset in Japanese Targeted at Non-Native Speakers. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), pages 179–187, Abu Dhabi, United Arab Emirates (Virtual). Association for Computational Linguistics.
Cite (Informal):
JADES: New Text Simplification Dataset in Japanese Targeted at Non-Native Speakers (Hayakawa et al., TSAR 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.tsar-1.17.pdf
Video:
 https://aclanthology.org/2022.tsar-1.17.mp4