A Consolidated Dataset for Knowledge-based Question Generation using Predicate Mapping of Linked Data

Johanna Melly, Gabriel Luthier, Andrei Popescu-Belis


Abstract
In this paper, we present the ForwardQuestions data set, made of human-generated questions related to knowledge triples. This data set results from the conversion and merger of the existing SimpleDBPediaQA and SimpleQuestionsWikidata data sets, including the mapping of predicates from DBPedia to Wikidata, and the selection of ‘forward’ questions as opposed to ‘backward’ ones. The new data set can be used to generate novel questions given an unseen Wikidata triple, by replacing the subjects of existing questions with the new one and then selecting the best candidate questions using semantic and syntactic criteria. Evaluation results indicate that the question generation method using ForwardQuestions improves the quality of questions by about 20% with respect to a baseline not using ranking criteria.
Anthology ID:
2020.isa-1.7
Volume:
Proceedings of the 16th Joint ACL-ISO Workshop on Interoperable Semantic Annotation
Month:
May
Year:
2020
Address:
Marseille
Editor:
Harry Bunt
Venue:
ISA
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
59–66
Language:
English
URL:
https://aclanthology.org/2020.isa-1.7
DOI:
Bibkey:
Cite (ACL):
Johanna Melly, Gabriel Luthier, and Andrei Popescu-Belis. 2020. A Consolidated Dataset for Knowledge-based Question Generation using Predicate Mapping of Linked Data. In Proceedings of the 16th Joint ACL-ISO Workshop on Interoperable Semantic Annotation, pages 59–66, Marseille. European Language Resources Association.
Cite (Informal):
A Consolidated Dataset for Knowledge-based Question Generation using Predicate Mapping of Linked Data (Melly et al., ISA 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.isa-1.7.pdf