PV2TEA: Patching Visual Modality to Textual-Established Information Extraction

Hejie Cui; Rongmei Lin; Nasser Zalmout; Chenwei Zhang; Jingbo Shang; Carl Yang; Xian Li

doi:10.18653/v1/2023.findings-acl.127

PV2TEA: Patching Visual Modality to Textual-Established Information Extraction

Hejie Cui, Rongmei Lin, Nasser Zalmout, Chenwei Zhang, Jingbo Shang, Carl Yang, Xian Li

Abstract

Information extraction, e.g., attribute value extraction, has been extensively studied and formulated based only on text. However, many attributes can benefit from image-based extraction, like color, shape, pattern, among others. The visual modality has long been underutilized, mainly due to multimodal annotation difficulty. In this paper, we aim to patch the visual modality to the textual-established attribute in- formation extractor. The cross-modality integration faces several unique challenges: (C1) images and textual descriptions are loosely paired intra-sample and inter-samples; (C2) images usually contain rich backgrounds that can mislead the prediction; (C3) weakly supervised labels from textual-established ex- tractors are biased for multimodal training. We present PV2TEA, an encoder-decoder architecture equipped with three bias reduction schemes: (S1) Augmented label-smoothed contrast to improve the cross-modality alignment for loosely-paired image and text; (S2) Attention-pruning that adaptively distinguishes the visual foreground; (S3) Two-level neighborhood regularization that mitigates the label textual bias via reliability estimation. Empirical results on real-world e-Commerce datasets1 demonstrate up to 11.74% absolute (20.97% relatively) F1 increase over unimodal baselines.

Anthology ID:: 2023.findings-acl.127
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2026–2041
Language:
URL:: https://aclanthology.org/2023.findings-acl.127/
DOI:: 10.18653/v1/2023.findings-acl.127
Bibkey:
Cite (ACL):: Hejie Cui, Rongmei Lin, Nasser Zalmout, Chenwei Zhang, Jingbo Shang, Carl Yang, and Xian Li. 2023. PV2TEA: Patching Visual Modality to Textual-Established Information Extraction. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2026–2041, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: PV2TEA: Patching Visual Modality to Textual-Established Information Extraction (Cui et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.127.pdf

PDF Cite Search Fix data