2024
pdf
bib
abs
Knowledge-Aware Reasoning over Multimodal Semi-structured Tables
Suyash Vardhan Mathur
|
Jainit Sushil Bafna
|
Kunal Kartik
|
Harshita Khandelwal
|
Manish Shrivastava
|
Vivek Gupta
|
Mohit Bansal
|
Dan Roth
Findings of the Association for Computational Linguistics: EMNLP 2024
Existing datasets for tabular question answering typically focus exclusively on text within cells. However, real-world data is inherently multimodal, often blending images such as symbols, faces, icons, patterns, and charts with textual content in tables. With the evolution of AI models capable of multimodal reasoning, it is pertinent to assess their efficacy in handling such structured data. This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data. We explore their ability to reason on tables that integrate both images and text, introducing MMTabQA, a new dataset designed for this purpose. Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs, understanding visual context, and comparing visual content across images. These findings establish our dataset as a robust benchmark for advancing AI’s comprehension and capabilities in analyzing multimodal structured data.
pdf
bib
abs
LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task
Suyash Vardhan Mathur
|
Akshett Jindal
|
Hardik Mittal
|
Manish Shrivastava
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Conversation is the most natural form of human communication, where each utterance can range over a variety of possible emotions. While significant work has been done towards the detection of emotions in text, relatively little work has been done towards finding the cause of the said emotions, especially in multimodal settings. SemEval 2024 introduces the task of Multimodal Emotion Cause Analysis in Conversations, which aims to extract emotions reflected in individual utterances in a conversation involving multiple modalities (textual, audio, and visual modalities) along with the corresponding utterances that were the cause for the emotion. In this paper, we propose models that tackle this task as an utterance labeling and a sequence labeling problem and perform a comparative study of these models, involving baselines using different encoders, using BiLSTM for adding contextual information of the conversation, and finally adding a CRF layer to try to model the inter-dependencies between adjacent utterances more effectively. In the official leaderboard for the task, our architecture was ranked 8th, achieving an F1-score of 0.1759 on the leaderboard.
pdf
bib
abs
DaVinci at SemEval-2024 Task 9: Few-shot prompting GPT-3.5 for Unconventional Reasoning
Suyash Vardhan Mathur
|
Akshett Jindal
|
Manish Shrivastava
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
While significant work has been done in the field of NLP on vertical thinking, which involves primarily logical thinking, little work has been done towards lateral thinking, which involves looking at problems from an unconventional perspective defying existing conceptions and notions. Towards this direction, SemEval 2024 introduces the task of BRAINTEASER, which involves two types of questions – Sentence Puzzle and Word Puzzle that defy conventional common-sense reasoning and constraints. In this paper, we tackle both the questions using few-shot prompting on GPT-3.5 and gain insights regarding the difference in the nature of the two types of questions. Our prompting strategy placed us 26th on the leaderboard for the Sentence Puzzle and 15th on the Word Puzzle task.