Developing Dataset of Japanese Slot Filling Quizzes Designed for Evaluation of Machine Reading Comprehension

Takuto Watarai, Masatoshi Tsuchiya


Abstract
This paper describes our developing dataset of Japanese slot filling quizzes designed for evaluation of machine reading comprehension. The dataset consists of quizzes automatically generated from Aozora Bunko, and each quiz is defined as a 4-tuple: a context passage, a query holding a slot, an answer character and a set of possible answer characters. The query is generated from the original sentence, which appears immediately after the context passage on the target book, by replacing the answer character into the slot. The set of possible answer characters consists of the answer character and the other characters who appear in the context passage. Because the context passage and the query shares the same context, a machine which precisely understand the context may select the correct answer from the set of possible answer characters. The unique point of our approach is that we focus on characters of target books as slots to generate queries from original sentences, because they play important roles in narrative texts and precise understanding their relationship is necessary for reading comprehension. To extract characters from target books, manually created dictionaries of characters are employed because some characters appear as common nouns not as named entities.
Anthology ID:
2020.lrec-1.852
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6895–6901
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.852
DOI:
Bibkey:
Cite (ACL):
Takuto Watarai and Masatoshi Tsuchiya. 2020. Developing Dataset of Japanese Slot Filling Quizzes Designed for Evaluation of Machine Reading Comprehension. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6895–6901, Marseille, France. European Language Resources Association.
Cite (Informal):
Developing Dataset of Japanese Slot Filling Quizzes Designed for Evaluation of Machine Reading Comprehension (Watarai & Tsuchiya, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.852.pdf