SDA: Semantic Discrepancy Alignment for Text-conditioned Image Retrieval

Yuchen Yang, Yu Wang, Yanfeng Wang


Abstract
In the realm of text-conditioned image retrieval, models utilize a query composed of a reference image and modification text to retrieve corresponding images. Despite its significance, this task is fraught with challenges, including small-scale datasets due to labeling costs and the complexity of attributes in modification texts. These challenges often result in models learning a generalized representation of the query, thereby missing the semantic correlations of image and text attributes.In this paper, we introduce a general boosting framework designed to address these issues by employing semantic discrepancy alignment. Our framework first leverages the ChatGPT to augment text data by modifying the original modification text’s attributes. The augmented text is then combined with the original reference image to create an augmented composed query. Then we generate corresponding images using GPT-4 for the augmented composed query.We realize the cross-modal semantic discrepancy alignment by formulating distance consistency and neighbor consistency between the image and text domains. Through this novel approach, attribute in the text domain can be more effectively transferred to the image domain, enhancing retrieval performance. Extensive experiments on three prominent datasets validate the effectiveness of our approach, with state-of-the-art results on a majority of evaluation metrics compared to various baseline methods.
Anthology ID:
2024.findings-acl.311
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5250–5261
Language:
URL:
https://aclanthology.org/2024.findings-acl.311
DOI:
Bibkey:
Cite (ACL):
Yuchen Yang, Yu Wang, and Yanfeng Wang. 2024. SDA: Semantic Discrepancy Alignment for Text-conditioned Image Retrieval. In Findings of the Association for Computational Linguistics ACL 2024, pages 5250–5261, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
SDA: Semantic Discrepancy Alignment for Text-conditioned Image Retrieval (Yang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.311.pdf