ESPVR: Entity Spans Position Visual Regions for Multimodal Named Entity Recognition

Xiujiao Li, Guanglu Sun, Xinyu Liu


Abstract
Multimodal Named Entity Recognition (MNER) uses visual information to improve the performance of text-only Named Entity Recognition (NER). However, existing methods for acquiring local visual information suffer from certain limitations: (1) using an attention-based method to extract visual regions related to the text from visual regions obtained through convolutional architectures (e.g., ResNet), attention is distracted by the entire image, rather than being fully focused on the visual regions most relevant to the text; (2) using an object detection-based (e.g., Mask R-CNN) method to detect visual object regions related to the text, object detection has a limited range of recognition categories. Moreover, the visual regions obtained by object detection may not correspond to the entities in the text. In summary, the goal of these methods is not to extract the most relevant visual regions for the entities in the text. The visual regions obtained by these methods may be redundant or insufficient for the entities in the text. In this paper, we propose an Entity Spans Position Visual Regions (ESPVR) module to obtain the most relevant visual regions corresponding to the entities in the text. Experiments show that our proposed approach can achieve the SOTA on Twitter-2017 and competitive results on Twitter-2015.
Anthology ID:
2023.findings-emnlp.522
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7785–7794
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.522
DOI:
10.18653/v1/2023.findings-emnlp.522
Bibkey:
Cite (ACL):
Xiujiao Li, Guanglu Sun, and Xinyu Liu. 2023. ESPVR: Entity Spans Position Visual Regions for Multimodal Named Entity Recognition. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7785–7794, Singapore. Association for Computational Linguistics.
Cite (Informal):
ESPVR: Entity Spans Position Visual Regions for Multimodal Named Entity Recognition (Li et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.522.pdf