MELOV: Multimodal Entity Linking with Optimized Visual Features in Latent Space

Xuhui Sui, Ying Zhang, Yu Zhao, Kehui Song, Baohang Zhou, Xiaojie Yuan


Abstract
Multimodal entity linking (MEL), which aligns ambiguous mentions within multimodal contexts to referent entities from multimodal knowledge bases, is essential for many natural language processing applications. Previous MEL methods mainly focus on exploring complex multimodal interaction mechanisms to better capture coherence evidence between mentions and entities by mining complementary information. However, in real-world social media scenarios, vision modality often exhibits low quality, low value, or low relevance to the mention. Integrating such information directly will backfire, leading to a weakened consistency between mentions and their corresponding entities. In this paper, we propose a novel latent space vision feature optimization framework MELOV, which combines inter-modality and intra-modality optimizations to address these challenges. For the inter-modality optimization, we exploit the variational autoencoder to mine shared information and generate text-based visual features. For the intra-modality optimization, we consider the relationships between mentions and build graph convolutional network to aggregate the visual features of semantic similar neighbors. Extensive experiments on three benchmark datasets demonstrate the superiority of our proposed framework.
Anthology ID:
2024.findings-acl.46
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
816–826
Language:
URL:
https://aclanthology.org/2024.findings-acl.46
DOI:
Bibkey:
Cite (ACL):
Xuhui Sui, Ying Zhang, Yu Zhao, Kehui Song, Baohang Zhou, and Xiaojie Yuan. 2024. MELOV: Multimodal Entity Linking with Optimized Visual Features in Latent Space. In Findings of the Association for Computational Linguistics ACL 2024, pages 816–826, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
MELOV: Multimodal Entity Linking with Optimized Visual Features in Latent Space (Sui et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.46.pdf