Improving the Precision of Natural Textual Entailment Problem Datasets

Jean-Philippe Bernardy, Stergios Chatzikyriakidis


Abstract
In this paper, we propose a method to modify natural textual entailment problem datasets so that they better reflect a more precise notion of entailment. We apply this method to a subset of the Recognizing Textual Entailment datasets. We thus obtain a new corpus of entailment problems, which has the following three characteristics: 1. it is precise (does not leave out implicit hypotheses) 2. it is based on “real-world” texts (i.e. most of the premises were written for purposes other than testing textual entailment). 3. its size is 150. Broadly, the method that we employ is to make any missing hypotheses explicit using a crowd of experts. We discuss the relevance of our method in improving existing NLI datasets to be more fit for precise reasoning and we argue that this corpus can be the basis a first step towards wide-coverage testing of precise natural-language inference systems.
Anthology ID:
2020.lrec-1.844
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6835–6840
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.844
DOI:
Bibkey:
Cite (ACL):
Jean-Philippe Bernardy and Stergios Chatzikyriakidis. 2020. Improving the Precision of Natural Textual Entailment Problem Datasets. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6835–6840, Marseille, France. European Language Resources Association.
Cite (Informal):
Improving the Precision of Natural Textual Entailment Problem Datasets (Bernardy & Chatzikyriakidis, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.844.pdf