Good Visual Guidance Make A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction

Xiang Chen; Ningyu Zhang; Lei Li; Yunzhi Yao; Shumin Deng; Chuanqi Tan; Fei Huang; Luo Si; Huajun Chen

doi:10.18653/v1/2022.findings-naacl.121

Good Visual Guidance Make A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction

Xiang Chen, Ningyu Zhang, Lei Li, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, Huajun Chen

Abstract

Multimodal named entity recognition and relation extraction (MNER and MRE) is a fundamental and crucial branch in information extraction. However, existing approaches for MNER and MRE usually suffer from error sensitivity when irrelevant object images incorporated in texts. To deal with these issues, we propose a novel Hierarchical Visual Prefix fusion NeTwork (HVPNeT) for visual-enhanced entity and relation extraction, aiming to achieve more effective and robust performance. Specifically, we regard visual representation as pluggable visual prefix to guide the textual representation for error insensitive forecasting decision. We further propose a dynamic gated aggregation strategy to achieve hierarchical multi-scaled visual features as visual prefix for fusion. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our method, and achieve state-of-the-art performance.

Anthology ID:: 2022.findings-naacl.121
Volume:: Findings of the Association for Computational Linguistics: NAACL 2022
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1607–1618
Language:
URL:: https://aclanthology.org/2022.findings-naacl.121/
DOI:: 10.18653/v1/2022.findings-naacl.121
Bibkey:
Cite (ACL):: Xiang Chen, Ningyu Zhang, Lei Li, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, and Huajun Chen. 2022. Good Visual Guidance Make A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1607–1618, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: Good Visual Guidance Make A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction (Chen et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-naacl.121.pdf
Software:: 2022.findings-naacl.121.software.zip
Video:: https://aclanthology.org/2022.findings-naacl.121.mp4

PDF Cite Search Software Video Fix data