Knowledge-Aware Reasoning over Multimodal Semi-structured Tables

Suyash Vardhan Mathur; Jainit Sushil Bafna; Kunal Kartik; Harshita Khandelwal; Manish Shrivastava; Vivek Gupta; Mohit Bansal; Dan Roth

doi:10.18653/v1/2024.findings-emnlp.822

Knowledge-Aware Reasoning over Multimodal Semi-structured Tables

Suyash Vardhan Mathur, Jainit Sushil Bafna, Kunal Kartik, Harshita Khandelwal, Manish Shrivastava, Vivek Gupta, Mohit Bansal, Dan Roth

Abstract

Existing datasets for tabular question answering typically focus exclusively on text within cells. However, real-world data is inherently multimodal, often blending images such as symbols, faces, icons, patterns, and charts with textual content in tables. With the evolution of AI models capable of multimodal reasoning, it is pertinent to assess their efficacy in handling such structured data. This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data. We explore their ability to reason on tables that integrate both images and text, introducing MMTabQA, a new dataset designed for this purpose. Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs, understanding visual context, and comparing visual content across images. These findings establish our dataset as a robust benchmark for advancing AI’s comprehension and capabilities in analyzing multimodal structured data.

Anthology ID:: 2024.findings-emnlp.822
Original:: 2024.findings-emnlp.822v1
Version 2:: 2024.findings-emnlp.822v2
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14054–14073
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.822/
DOI:: 10.18653/v1/2024.findings-emnlp.822
Bibkey:
Cite (ACL):: Suyash Vardhan Mathur, Jainit Sushil Bafna, Kunal Kartik, Harshita Khandelwal, Manish Shrivastava, Vivek Gupta, Mohit Bansal, and Dan Roth. 2024. Knowledge-Aware Reasoning over Multimodal Semi-structured Tables. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 14054–14073, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Knowledge-Aware Reasoning over Multimodal Semi-structured Tables (Mathur et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.822.pdf

PDF (v2) PDF (v1) Cite Search Fix data