Visually-augmented pretrained language models for NLP tasks without images

Hangyu Guo; Kun Zhou; Wayne Xin Zhao; Qinyu Zhang; Ji-Rong Wen

doi:10.18653/v1/2023.acl-long.833

Visually-augmented pretrained language models for NLP tasks without images

Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Qinyu Zhang, Ji-Rong Wen

Abstract

Although pre-trained language models (PLMs) have shown impressive performance by text-only self-supervised training, they are found lack of visual semantics or commonsense. Existing solutions often rely on explicit images for visual knowledge augmentation (requiring time-consuming retrieval or generation), and they also conduct the augmentation for the whole input text, without considering whether it is actually needed in specific inputs or tasks. To address these issues, we propose a novel **V**isually-**A**ugmented fine-tuning approach that can be generally applied to various PLMs or NLP tasks, **W**ithout using any retrieved or generated **I**mages, namely **VAWI**. Experimental results show that our approach can consistently improve the performance of BERT, RoBERTa, BART, and T5 at different scales, and outperform several competitive baselines on ten tasks. Our codes and data are publicly available at https://github.com/RUCAIBox/VAWI.

Anthology ID:: 2023.acl-long.833
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14912–14929
Language:
URL:: https://aclanthology.org/2023.acl-long.833/
DOI:: 10.18653/v1/2023.acl-long.833
Bibkey:
Cite (ACL):: Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Qinyu Zhang, and Ji-Rong Wen. 2023. Visually-augmented pretrained language models for NLP tasks without images. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14912–14929, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Visually-augmented pretrained language models for NLP tasks without images (Guo et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-long.833.pdf
Video:: https://aclanthology.org/2023.acl-long.833.mp4

PDF Cite Search Video Fix data