Differentiated Vision: Unveiling Entity-Specific Visual Modality Requirements for Multimodal Knowledge Graph

Minghang Liu, Yinghan Shen, Zihe Huang, Yuanzhuo Wang, Xuhui Jiang, Huawei Shen


Abstract
Multimodal Knowledge Graphs (MMKGs) enhance knowledge representations by integrating structural and multimodal information of entities. Recently, MMKGs have proven effective in tasks such as information retrieval, knowledge discovery, and question answering. Current methods typically utilize pre-trained visual encoders to extract features from images associated with each entity, emphasizing complex cross-modal interactions. However, these approaches often overlook the varying relevance of visual information across entities. Specifically, not all entities benefit from visual data, and not all associated images are pertinent, with irrelevant images introducing noise and potentially degrading model performance. To address these issues, we propose the Differentiated Vision for Multimodal Knowledge Graphs (DVMKG) model. DVMKG evaluates the necessity of visual modality for each entity based on its intrinsic attributes and assesses image quality through representativeness and diversity. Leveraging these metrics, DVMKG dynamically adjusts the influence of visual data during feature integration, tailoring it to the specific needs of different entity types. Extensive experiments on multiple benchmark datasets confirm the effectiveness of DVMKG, demonstrating significant improvements over existing methods.
Anthology ID:
2025.findings-emnlp.1097
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20170–20183
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1097/
DOI:
Bibkey:
Cite (ACL):
Minghang Liu, Yinghan Shen, Zihe Huang, Yuanzhuo Wang, Xuhui Jiang, and Huawei Shen. 2025. Differentiated Vision: Unveiling Entity-Specific Visual Modality Requirements for Multimodal Knowledge Graph. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 20170–20183, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Differentiated Vision: Unveiling Entity-Specific Visual Modality Requirements for Multimodal Knowledge Graph (Liu et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1097.pdf
Checklist:
 2025.findings-emnlp.1097.checklist.pdf