MM-ChatAlign: A Novel Multimodal Reasoning Framework based on Large Language Models for Entity Alignment

Xuhui Jiang, Yinghan Shen, Zhichao Shi, Chengjin Xu, Wei Li, Huang Zihe, Jian Guo, Yuanzhuo Wang


Abstract
Multimodal entity alignment (MMEA) integrates multi-source and cross-modal knowledge graphs, a crucial yet challenging task for data-centric applications.Traditional MMEA methods derive the visual embeddings of entities and combine them with other modal data for alignment by embedding similarity comparison.However, these methods are hampered by the limited comprehension of visual attributes and deficiencies in realizing and bridging the semantics of multimodal data. To address these challenges, we propose MM-ChatAlign, a novel framework that utilizes the visual reasoning abilities of MLLMs for MMEA.The framework features an embedding-based candidate collection module that adapts to various knowledge representation strategies, effectively filtering out irrelevant reasoning candidates. Additionally, a reasoning and rethinking module, powered by MLLMs, enhances alignment by efficiently utilizing multimodal information.Extensive experiments on four MMEA datasets demonstrate MM-ChatAlign’s superiority and underscore the significant potential of MLLMs in MMEA tasks.The source code is available at https://github.com/jxh4945777/MMEA/.
Anthology ID:
2024.findings-emnlp.148
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2637–2654
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.148
DOI:
Bibkey:
Cite (ACL):
Xuhui Jiang, Yinghan Shen, Zhichao Shi, Chengjin Xu, Wei Li, Huang Zihe, Jian Guo, and Yuanzhuo Wang. 2024. MM-ChatAlign: A Novel Multimodal Reasoning Framework based on Large Language Models for Entity Alignment. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 2637–2654, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
MM-ChatAlign: A Novel Multimodal Reasoning Framework based on Large Language Models for Entity Alignment (Jiang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.148.pdf
Software:
 2024.findings-emnlp.148.software.zip
Data:
 2024.findings-emnlp.148.data.zip