Explaining Simple Natural Language Inference

Aikaterini-Lida Kalouli, Annebeth Buis, Livy Real, Martha Palmer, Valeria de Paiva


Abstract
The vast amount of research introducing new corpora and techniques for semi-automatically annotating corpora shows the important role that datasets play in today’s research, especially in the machine learning community. This rapid development raises concerns about the quality of the datasets created and consequently of the models trained, as recently discussed with respect to the Natural Language Inference (NLI) task. In this work we conduct an annotation experiment based on a small subset of the SICK corpus. The experiment reveals several problems in the annotation guidelines, and various challenges of the NLI task itself. Our quantitative evaluation of the experiment allows us to assign our empirical observations to specific linguistic phenomena and leads us to recommendations for future annotation tasks, for NLI and possibly for other tasks.
Anthology ID:
W19-4016
Volume:
Proceedings of the 13th Linguistic Annotation Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Annemarie Friedrich, Deniz Zeyrek, Jet Hoek
Venue:
LAW
SIG:
SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
132–143
Language:
URL:
https://aclanthology.org/W19-4016
DOI:
10.18653/v1/W19-4016
Bibkey:
Cite (ACL):
Aikaterini-Lida Kalouli, Annebeth Buis, Livy Real, Martha Palmer, and Valeria de Paiva. 2019. Explaining Simple Natural Language Inference. In Proceedings of the 13th Linguistic Annotation Workshop, pages 132–143, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Explaining Simple Natural Language Inference (Kalouli et al., LAW 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4016.pdf
Code
 kkalouli/SICK-processing
Data
MultiNLISICKSNLI