Visually-Enhanced Phrase Understanding

Tsu-Yuan Hsu; Chen-An Li; Chao-Wei Huang; Yun-Nung Chen

doi:10.18653/v1/2023.findings-acl.363

Visually-Enhanced Phrase Understanding

Tsu-Yuan Hsu, Chen-An Li, Chao-Wei Huang, Yun-Nung Chen

Abstract

Large-scale vision-language pre-training has exhibited strong performance in various visual and textual understanding tasks. Recently, the textual encoders of multi-modal pre-trained models have been shown to generate high-quality textual representations, which often outperform models that are purely text-based, such as BERT. In this study, our objective is to utilize both textual and visual encoders of multi-modal pre-trained models to enhance language understanding tasks. We achieve this by generating an image associated with a textual prompt, thus enriching the representation of a phrase for downstream tasks. Results from experiments conducted on four benchmark datasets demonstrate that our proposed method, which leverages visually-enhanced text representations, significantly improves performance in the entity clustering task.

Anthology ID:: 2023.findings-acl.363
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5879–5888
Language:
URL:: https://aclanthology.org/2023.findings-acl.363
DOI:: 10.18653/v1/2023.findings-acl.363
Bibkey:
Cite (ACL):: Tsu-Yuan Hsu, Chen-An Li, Chao-Wei Huang, and Yun-Nung Chen. 2023. Visually-Enhanced Phrase Understanding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5879–5888, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Visually-Enhanced Phrase Understanding (Hsu et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.363.pdf
Video:: https://aclanthology.org/2023.findings-acl.363.mp4

PDF Cite Search Video