Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models

Chang-Sheng Kao; Yun-Nung Chen

Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models

Abstract

For dialogue systems, the utilization of multimodal dialogue responses, as opposed to relying solely on text-only responses, offers the capability to describe different concepts through various modalities. This enhances the effectiveness of communication and elevates the overall conversational experience. However, current methods for dialogue-to-image retrieval are constrained by the capabilities of the pre-trained vision language models (VLMs). They struggle to accurately extract key information from conversations and are unable to handle long-turn conversations. In this paper, we leverage the reasoning capabilities of large language models (LLMs) to predict the potential features that may be present in the images to be shared, based on the dialogue context. This approach allows us to obtain succinct and precise descriptors, thereby improving the performance of text-image retrieval. Experimental results shows that our method outperforms previous approaches significantly in terms of Recall@k.

Anthology ID:: 2024.findings-acl.700
Volume:: Findings of the Association for Computational Linguistics ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand and virtual meeting
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11777–11788
Language:
URL:: https://aclanthology.org/2024.findings-acl.700
DOI:
Bibkey:
Cite (ACL):: Chang-Sheng Kao and Yun-Nung Chen. 2024. Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models. In Findings of the Association for Computational Linguistics ACL 2024, pages 11777–11788, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models (Kao & Chen, Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.700.pdf

PDF Cite Search