Simple but Challenging: Natural Language Inference Models Fail on Simple Sentences

Cheng Luo; Wei Liu; Jieyu Lin; Jiajie Zou; Ming Xiang; Nai Ding

doi:10.18653/v1/2022.findings-emnlp.252

Simple but Challenging: Natural Language Inference Models Fail on Simple Sentences

Cheng Luo, Wei Liu, Jieyu Lin, Jiajie Zou, Ming Xiang, Nai Ding

Abstract

Natural language inference (NLI) is a task to infer the relationship between a premise and a hypothesis (e.g., entailment, neutral, or contradiction), and transformer-based models perform well on current NLI datasets such as MNLI and SNLI. Nevertheless, given the linguistic complexity of the large-scale datasets, it remains controversial whether these models can truly infer the relationship between sentences or they simply guess the answer via shallow heuristics. Here, we introduce a controlled evaluation set called Simple Pair to test the basic sentence inference ability of NLI models using sentences with syntactically simple structures. Three popular transformer-based models, i.e., BERT, RoBERTa, and DeBERTa, are employed. We find that these models fine-tuned on MNLI or SNLI perform very poorly on Simple Pair (< 35.4% accuracy). Further analyses reveal event coreference and compositional binding problems in these models. To improve the model performance, we augment the training set, i.e., MNLI or SNLI, with a few examples constructed based on Simple Pair ( 1% of the size of the original SNLI/MNLI training sets). Models fine-tuned on the augmented training set maintain high performance on MNLI/SNLI and perform very well on Simple Pair (~100% accuracy). Furthermore, the positive performance of the augmented training models can transfer to more complex examples constructed based on sentences from MNLI and SNLI. Taken together, the current work shows that (1) models achieving high accuracy on mainstream large-scale datasets still lack the capacity to draw accurate inferences on simple sentences, and (2) augmenting mainstream datasets with a small number of target simple sentences can effectively improve model performance.

Anthology ID:: 2022.findings-emnlp.252
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3449–3462
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.252/
DOI:: 10.18653/v1/2022.findings-emnlp.252
Bibkey:
Cite (ACL):: Cheng Luo, Wei Liu, Jieyu Lin, Jiajie Zou, Ming Xiang, and Nai Ding. 2022. Simple but Challenging: Natural Language Inference Models Fail on Simple Sentences. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3449–3462, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Simple but Challenging: Natural Language Inference Models Fail on Simple Sentences (Luo et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-emnlp.252.pdf
Software:: 2022.findings-emnlp.252.software.zip
Dataset:: 2022.findings-emnlp.252.dataset.zip

PDF Cite Search Software Dataset Fix data