P-MNER: Cross Modal Correction Fusion Network with Prompt Learning for Multimodal Named Entity Recognitiong

Wang Zhuang, Zhang Yijia, An Kang, Zhou Xiaoying, Lu Mingyu, Lin Hongfei


Abstract
“Multimodal Named Entity Recognition (MNER) is a challenging task in social mediadue to the combination of text and image features. Previous MNER work has focused onpredicting entity information after fusing visual and text features. However, pre-traininglanguage models have already acquired vast amounts of knowledge during their pre-training process. To leverage this knowledge, we propose a prompt network for MNERtasks (P-MNER).To minimize the noise generated by irrelevant areas in the image, wedesign a visual feature extraction model (FRR) based on FasterRCNN and ResNet, whichuses fine-grained visual features to assist MNER tasks. Moreover, we introduce a textcorrection fusion module (TCFM) into the model to address visual bias during modalfusion. We employ the idea of a residual network to modify the fused features using theoriginal text features. Our experiments on two benchmark datasets demonstrate that ourproposed model outperforms existing MNER methods. P-MNER’s ability to leveragepre-training knowledge from language models, incorporate fine-grained visual features,and correct for visual bias, makes it a promising approach for multimodal named entityrecognition in social media posts.”
Anthology ID:
2023.ccl-1.59
Volume:
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
Month:
August
Year:
2023
Address:
Harbin, China
Editors:
Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
689–700
Language:
English
URL:
https://aclanthology.org/2023.ccl-1.59
DOI:
Bibkey:
Cite (ACL):
Wang Zhuang, Zhang Yijia, An Kang, Zhou Xiaoying, Lu Mingyu, and Lin Hongfei. 2023. P-MNER: Cross Modal Correction Fusion Network with Prompt Learning for Multimodal Named Entity Recognitiong. In Proceedings of the 22nd Chinese National Conference on Computational Linguistics, pages 689–700, Harbin, China. Chinese Information Processing Society of China.
Cite (Informal):
P-MNER: Cross Modal Correction Fusion Network with Prompt Learning for Multimodal Named Entity Recognitiong (Zhuang et al., CCL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ccl-1.59.pdf