MEEL: Multi-Modal Event Evolution Learning

Zhengwei Tao, Zhi Jin, Junqiang Huang, Xiancai Chen, Xiaoying Bai, Yifan Zhang, Chongyang Tao


Abstract
Multi-modal Event Reasoning (MMER) endeavors to endow machines with the ability to comprehend intricate event relations across diverse data modalities. MMER is fundamental and underlies a wide broad of applications. Despite extensive instruction fine-tuning, current multi-modal large language models still fall short in such ability. The disparity stems from that existing models are insufficient to capture underlying principles governing event evolution in various scenarios. In this paper, we introduce Multi-Modal Event Evolution Learning (MEEL) to enable the model to grasp the event evolution mechanism yielding advanced MMER ability. Specifically, we commence with the design of event diversification to gather seed events from a rich spectrum of scenarios. Subsequently, we employ ChatGPT to generate evolving graphs for these seed events. We propose an instruction encapsulation process that formulates the evolving graphs into instruction-tuning data, aligning the comprehension of event reasoning to humans. Finally, we observe that models trained in this way are still struggling to fully comprehend event evolution. In such a case, we propose the guiding discrimination strategy, in which models are trained to discriminate the improper evolution direction. We collect and curate a benchmark M-EV2 for MMER. Extensive experiments on M-EV2 validate the effectiveness of our approach, showcasing competitive performance in open-source multi-modal LLMs.
Anthology ID:
2024.findings-acl.528
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8912–8925
Language:
URL:
https://aclanthology.org/2024.findings-acl.528
DOI:
Bibkey:
Cite (ACL):
Zhengwei Tao, Zhi Jin, Junqiang Huang, Xiancai Chen, Xiaoying Bai, Yifan Zhang, and Chongyang Tao. 2024. MEEL: Multi-Modal Event Evolution Learning. In Findings of the Association for Computational Linguistics ACL 2024, pages 8912–8925, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
MEEL: Multi-Modal Event Evolution Learning (Tao et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.528.pdf