A Span-based Multimodal Variational Autoencoder for Semi-supervised Multimodal Named Entity Recognition

Baohang Zhou; Ying Zhang; Kehui Song; Wenya Guo; Guoqing Zhao; Hongbin Wang (王洪彬); Xiaojie Yuan

doi:10.18653/v1/2022.emnlp-main.422

A Span-based Multimodal Variational Autoencoder for Semi-supervised Multimodal Named Entity Recognition

Baohang Zhou, Ying Zhang, Kehui Song, Wenya Guo, Guoqing Zhao, Hongbin Wang, Xiaojie Yuan

Abstract

Multimodal named entity recognition (MNER) on social media is a challenging task which aims to extract named entities in free text and incorporate images to classify them into user-defined types. However, the annotation for named entities on social media demands a mount of human efforts. The existing semi-supervised named entity recognition methods focus on the text modal and are utilized to reduce labeling costs in traditional NER. However, the previous methods are not efficient for semi-supervised MNER. Because the MNER task is defined to combine the text information with image one and needs to consider the mismatch between the posted text and image. To fuse the text and image features for MNER effectively under semi-supervised setting, we propose a novel span-based multimodal variational autoencoder (SMVAE) model for semi-supervised MNER. The proposed method exploits modal-specific VAEs to model text and image latent features, and utilizes product-of-experts to acquire multimodal features. In our approach, the implicit relations between labels and multimodal features are modeled by multimodal VAE. Thus, the useful information of unlabeled data can be exploited in our method under semi-supervised setting. Experimental results on two benchmark datasets demonstrate that our approach not only outperforms baselines under supervised setting, but also improves MNER performance with less labeled data than existing semi-supervised methods.

Anthology ID:: 2022.emnlp-main.422
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6293–6302
Language:
URL:: https://aclanthology.org/2022.emnlp-main.422/
DOI:: 10.18653/v1/2022.emnlp-main.422
Bibkey:
Cite (ACL):: Baohang Zhou, Ying Zhang, Kehui Song, Wenya Guo, Guoqing Zhao, Hongbin Wang, and Xiaojie Yuan. 2022. A Span-based Multimodal Variational Autoencoder for Semi-supervised Multimodal Named Entity Recognition. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6293–6302, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: A Span-based Multimodal Variational Autoencoder for Semi-supervised Multimodal Named Entity Recognition (Zhou et al., EMNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.emnlp-main.422.pdf

PDF Cite Search Fix data