Zhanhao Xiao


2023

pdf bib
Exploring the Capacity of Pretrained Language Models for Reasoning about Actions and Change
Weinan He | Canming Huang | Zhanhao Xiao | Yongmei Liu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Reasoning about actions and change (RAC) is essential to understand and interact with the ever-changing environment. Previous AI research has shown the importance of fundamental and indispensable knowledge of actions, i.e., preconditions and effects. However, traditional methods rely on logical formalization which hinders practical applications. With recent transformer-based language models (LMs), reasoning over text is desirable and seemingly feasible, leading to the question of whether LMs can effectively and efficiently learn to solve RAC problems. We propose four essential RAC tasks as a comprehensive textual benchmark and generate problems in a way that minimizes the influence of other linguistic requirements (e.g., grounding) to focus on RAC. The resulting benchmark, TRAC, encompassing problems of various complexities, facilitates a more granular evaluation of LMs, precisely targeting the structural generalization ability much needed for RAC. Experiments with three high-performing transformers indicate that additional efforts are needed to tackle challenges raised by TRAC.

2022

pdf bib
LogicNMR: Probing the Non-monotonic Reasoning Ability of Pre-trained Language Models
Yeliang Xiu | Zhanhao Xiao | Yongmei Liu
Findings of the Association for Computational Linguistics: EMNLP 2022

The logical reasoning capabilities of pre-trained language models have recently received much attention. As one of the vital reasoning paradigms, non-monotonic reasoning refers to the fact that conclusions may be invalidated with new information. Existing work has constructed a non-monotonic inference dataset 𝛿-NLI and explored the performance of language models on it. However, the 𝛿-NLI dataset is entangled with commonsense reasoning. In this paper, we explore the pure non-monotonic reasoning ability of pre-trained language models. We build a non-monotonic reasoning benchmark, named LogicNMR, with explicit default rules and iterative updates. In the experimental part, the performance of popular language models on LogicNMR is explored from the perspectives of accuracy, generalization, proof-based traceability and robustness. The experimental results show that even though the fine-tuned language models achieve an accuracy of more than 94.4% on LogicNMR, they perform unsatisfactorily, with a significant drop, in generalization and proof-based traceability.