WinoLogic: A Zero-Shot Logic-based Diagnostic Dataset for Winograd Schema Challenge

Weinan He, Canming Huang, Yongmei Liu, Xiaodan Zhu


Abstract
The recent success of neural language models (NLMs) on the Winograd Schema Challenge has called for further investigation of the commonsense reasoning ability of these models. Previous diagnostic datasets rely on crowd-sourcing which fails to provide coherent commonsense crucial for solving WSC problems. To better evaluate NLMs, we propose a logic-based framework that focuses on high-quality commonsense knowledge. Specifically, we identify and collect formal knowledge formulas verified by theorem provers and translate such formulas into natural language sentences. Based on these true knowledge sentences, adversarial false ones are generated. We propose a new dataset named WinoLogic with these sentences. Given a problem in WinoLogic, NLMs need to decide whether the plausible knowledge sentences could correctly solve the corresponding WSC problems in a zero-shot setting. We also ask human annotators to validate WinoLogic to ensure it is human-agreeable. Experiments show that NLMs still struggle to comprehend commonsense knowledge as humans do, indicating that their reasoning ability could have been overestimated.
Anthology ID:
2021.emnlp-main.307
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3779–3789
Language:
URL:
https://aclanthology.org/2021.emnlp-main.307
DOI:
10.18653/v1/2021.emnlp-main.307
Bibkey:
Cite (ACL):
Weinan He, Canming Huang, Yongmei Liu, and Xiaodan Zhu. 2021. WinoLogic: A Zero-Shot Logic-based Diagnostic Dataset for Winograd Schema Challenge. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3779–3789, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
WinoLogic: A Zero-Shot Logic-based Diagnostic Dataset for Winograd Schema Challenge (He et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.307.pdf
Software:
 2021.emnlp-main.307.Software.zip
Video:
 https://aclanthology.org/2021.emnlp-main.307.mp4
Data
GLUEMultiNLIQNLIWSCWinoGrandeWinoWhy