Jie Wei


2025

pdf bib
Enhancing Multimodal Named Entity Recognition through Adaptive Mixup Image Augmentation
Bo Xu | Haiqi Jiang | Jie Wei | Hongyu Jing | Ming Du | Hui Song | Hongya Wang | Yanghua Xiao
Proceedings of the 31st International Conference on Computational Linguistics

Multimodal named entity recognition (MNER) extends traditional named entity recognition (NER) by integrating visual and textual information. However, current methods still face significant challenges due to the text-image mismatch problem. Recent advancements in text-to-image synthesis provide promising solutions, as synthesized images can introduce additional visual context to enhance MNER model performance. To fully leverage the benefits of both original and synthesized images, we propose an adaptive mixup image augmentation method. This method generates augmented images by determining the mixing ratio based on the matching score between the text and image, utilizing a triplet loss-based Gaussian Mixture Model (TL-GMM). Our approach is highly adaptable and can be seamlessly integrated into existing MNER models. Extensive experiments demonstrate consistent performance improvements, and detailed ablation studies and case studies confirm the effectiveness of our method.