JADE: Corpus for Japanese Definition Modelling

Han Huang, Tomoyuki Kajiwara, Yuki Arase


Abstract
This study investigated and released the JADE, a corpus for Japanese definition modelling, which is a technique that automatically generates definitions of a given target word and phrase. It is a crucial technique for practical applications that assist language learning and education, as well as for those supporting reading documents in unfamiliar domains. Although corpora for development of definition modelling techniques have been actively created, their languages are mostly limited to English. In this study, a corpus for Japanese, named JADE, was created following the previous study that mines an online encyclopedia. The JADE provides about 630k sets of targets, their definitions, and usage examples as contexts for about 41k unique targets, which is sufficiently large to train neural models. The targets are both words and phrases, and the coverage of domains and topics is diverse. The performance of a pre-trained sequence-to-sequence model and the state-of-the-art definition modelling method was also benchmarked on JADE for future development of the technique in Japanese. The JADE corpus has been released and available online.
Anthology ID:
2022.lrec-1.743
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6884–6888
Language:
URL:
https://aclanthology.org/2022.lrec-1.743
DOI:
Bibkey:
Cite (ACL):
Han Huang, Tomoyuki Kajiwara, and Yuki Arase. 2022. JADE: Corpus for Japanese Definition Modelling. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6884–6888, Marseille, France. European Language Resources Association.
Cite (Informal):
JADE: Corpus for Japanese Definition Modelling (Huang et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.743.pdf