Enhancing multi-modal Relation Extraction with Reinforcement Learning Guided Graph Diffusion Framework

Rui Yang, Rajiv Gupta


Abstract
With the massive growth of multi-modal information such as text, images, and other data, how should we analyze and align these data becomes very important. In our work, we introduce a new framework based on Reinforcement Learning Guided Graph Diffusion to address the complexity of multi-modal graphs and enhance the interpretability, making it clearer to understand the alignment of multi-modal information. Our approach leverages pre-trained models to encode multi-modal data into scene graphs and combines them into a cross-modal graph (CMG). We design a reinforcement learning agent to filter nodes and modify edges based on the observation of the graph state to dynamically adjust the graph structure, providing coarse-grained refinement. Then we will iteratively optimize edge weights and node selection to achieve fine-grained adjustment. We conduct extensive experimental results on multi-modal relation extraction task datasets and show that our model significantly outperforms existing multi-modal methods such as MEGA and MKGFormer. We also conduct an ablation study to demonstrate the importance of each key component, showing that performance drops significantly when any key element is removed. Our method uses reinforcement learning methods to better mine potential multi-modal information relevance, and adjustments based on graph structure make our method more interpretable.
Anthology ID:
2025.coling-main.65
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
978–988
Language:
URL:
https://aclanthology.org/2025.coling-main.65/
DOI:
Bibkey:
Cite (ACL):
Rui Yang and Rajiv Gupta. 2025. Enhancing multi-modal Relation Extraction with Reinforcement Learning Guided Graph Diffusion Framework. In Proceedings of the 31st International Conference on Computational Linguistics, pages 978–988, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Enhancing multi-modal Relation Extraction with Reinforcement Learning Guided Graph Diffusion Framework (Yang & Gupta, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.65.pdf