Multi-modal Action Chain Abductive Reasoning

Mengze Li; Tianbao Wang; Jiahe Xu; Kairong Han; Shengyu Zhang; Zhou Zhao; Jiaxu Miao; Wenqiao Zhang; Shiliang Pu; Fei Wu

doi:10.18653/v1/2023.acl-long.254

Multi-modal Action Chain Abductive Reasoning

Mengze Li, Tianbao Wang, Jiahe Xu, Kairong Han, Shengyu Zhang, Zhou Zhao, Jiaxu Miao, Wenqiao Zhang, Shiliang Pu, Fei Wu

Abstract

Abductive Reasoning, has long been considered to be at the core ability of humans, which enables us to infer the most plausible explanation of incomplete known phenomena in daily life. However, such critical reasoning capability is rarely investigated for contemporary AI systems under such limited observations. To facilitate this research community, this paper sheds new light on Abductive Reasoning by studying a new vision-language task, Multi-modal Action chain abductive Reasoning (MAR), together with a large-scale Abductive Reasoning dataset: Given an incomplete set of language described events, MAR aims to imagine the most plausible event by spatio-temporal grounding in past video and then infer the hypothesis of subsequent action chain that can best explain the language premise. To solve this task, we propose a strong baseline model that realizes MAR from two perspectives: (i) we first introduce the transformer, which learns to encode the observation to imagine the plausible event with explicitly interpretable event grounding in the video based on the commonsense knowledge recognition ability. (ii) To complete the assumption of a follow-up action chain, we design a novel symbolic module that can complete strict derivation of the progressive action chain layer by layer. We conducted extensive experiments on the proposed dataset, and the experimental study shows that the proposed model significantly outperforms existing video-language models in terms of effectiveness on our newly created MAR dataset.

Anthology ID:: 2023.acl-long.254
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4617–4628
Language:
URL:: https://aclanthology.org/2023.acl-long.254/
DOI:: 10.18653/v1/2023.acl-long.254
Bibkey:
Cite (ACL):: Mengze Li, Tianbao Wang, Jiahe Xu, Kairong Han, Shengyu Zhang, Zhou Zhao, Jiaxu Miao, Wenqiao Zhang, Shiliang Pu, and Fei Wu. 2023. Multi-modal Action Chain Abductive Reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4617–4628, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Multi-modal Action Chain Abductive Reasoning (Li et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-long.254.pdf
Video:: https://aclanthology.org/2023.acl-long.254.mp4

PDF Cite Search Video Fix data