Investigating Reasons for Disagreement in Natural Language Inference

Nan-Jiang Jiang, Marie-Catherine de Marneffe


Abstract
We investigate how disagreement in natural language inference (NLI) annotation arises. We developed a taxonomy of disagreement sources with 10 categories spanning 3 high- level classes. We found that some disagreements are due to uncertainty in the sentence meaning, others to annotator biases and task artifacts, leading to different interpretations of the label distribution. We explore two modeling approaches for detecting items with potential disagreement: a 4-way classification with a “Complicated” label in addition to the three standard NLI labels, and a multilabel classification approach. We found that the multilabel classification is more expressive and gives better recall of the possible interpretations in the data.
Anthology ID:
2022.tacl-1.78
Volume:
Transactions of the Association for Computational Linguistics, Volume 10
Month:
Year:
2022
Address:
Cambridge, MA
Editors:
Brian Roark, Ani Nenkova
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
1357–1374
Language:
URL:
https://aclanthology.org/2022.tacl-1.78
DOI:
10.1162/tacl_a_00523
Bibkey:
Cite (ACL):
Nan-Jiang Jiang and Marie-Catherine de Marneffe. 2022. Investigating Reasons for Disagreement in Natural Language Inference. Transactions of the Association for Computational Linguistics, 10:1357–1374.
Cite (Informal):
Investigating Reasons for Disagreement in Natural Language Inference (Jiang & de Marneffe, TACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.tacl-1.78.pdf