Evaluating Paraphrastic Robustness in Textual Entailment Models

Dhruv Verma, Yash Kumar Lal, Shreyashee Sinha, Benjamin Van Durme, Adam Poliak


Abstract
We present PaRTE, a collection of 1,126 pairs of Recognizing Textual Entailment (RTE) examples to evaluate whether models are robust to paraphrasing. We posit that if RTE models understand language, their predictions should be consistent across inputs that share the same meaning. We use the evaluation set to determine if RTE models’ predictions change when examples are paraphrased. In our experiments, contemporary models change their predictions on 8-16% of paraphrased examples, indicating that there is still room for improvement.
Anthology ID:
2023.acl-short.76
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
880–892
Language:
URL:
https://aclanthology.org/2023.acl-short.76
DOI:
10.18653/v1/2023.acl-short.76
Bibkey:
Cite (ACL):
Dhruv Verma, Yash Kumar Lal, Shreyashee Sinha, Benjamin Van Durme, and Adam Poliak. 2023. Evaluating Paraphrastic Robustness in Textual Entailment Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 880–892, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Evaluating Paraphrastic Robustness in Textual Entailment Models (Verma et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-short.76.pdf
Video:
 https://aclanthology.org/2023.acl-short.76.mp4