An Kang


2023

pdf bib
P-MNER: Cross Modal Correction Fusion Network with Prompt Learning for Multimodal Named Entity Recognitiong
Wang Zhuang | Zhang Yijia | An Kang | Zhou Xiaoying | Lu Mingyu | Lin Hongfei
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“Multimodal Named Entity Recognition (MNER) is a challenging task in social mediadue to the combination of text and image features. Previous MNER work has focused onpredicting entity information after fusing visual and text features. However, pre-traininglanguage models have already acquired vast amounts of knowledge during their pre-training process. To leverage this knowledge, we propose a prompt network for MNERtasks (P-MNER).To minimize the noise generated by irrelevant areas in the image, wedesign a visual feature extraction model (FRR) based on FasterRCNN and ResNet, whichuses fine-grained visual features to assist MNER tasks. Moreover, we introduce a textcorrection fusion module (TCFM) into the model to address visual bias during modalfusion. We employ the idea of a residual network to modify the fused features using theoriginal text features. Our experiments on two benchmark datasets demonstrate that ourproposed model outperforms existing MNER methods. P-MNER’s ability to leveragepre-training knowledge from language models, incorporate fine-grained visual features,and correct for visual bias, makes it a promising approach for multimodal named entityrecognition in social media posts.”