Haolong Yan


2022

pdf bib
Entity-level Interaction via Heterogeneous Graph for Multimodal Named Entity Recognition
Gang Zhao | Guanting Dong | Yidong Shi | Haolong Yan | Weiran Xu | Si Li
Findings of the Association for Computational Linguistics: EMNLP 2022

Multimodal Named Entity Recognition (MNER) faces two specific challenges: 1) How to capture useful entity-related visual information. 2) How to alleviate the interference of visual noise. Previous works have gained progress by improving interacting mechanisms or seeking for better visual features. However, existing methods neglect the integrity of entity semantics and conduct cross-modal interaction at token-level, which cuts apart the semantics of entities and makes non-entity tokens easily interfered with by irrelevant visual noise. Thus in this paper, we propose an end-to-end heterogeneous Graph-based Entity-level Interacting model (GEI) for MNER. GEI first utilizes a span detection subtask to obtain entity representations, which serve as the bridge between two modalities. Then, the heterogeneous graph interacting network interacts entity with object nodes to capture entity-related visual information, and fuses it into only entity-associated tokens to rid non-entity tokens of the visual noise. Experiments on two widely used datasets demonstrate the effectiveness of our method. Our code will be available at https://github.com/GangZhao98/GEI.