Garima Pruthi
2024
PRISM: A New Lens for Improved Color Understanding
Arjun Reddy Akula
|
Garima Pruthi
|
Inderjit S Dhillon
|
Pradyumna Narayana
|
Sugato Basu
|
Varun Jampani
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
While image-text pre-trained models, such as CLIP, have demonstrated impressive capabilities in learning robust text and image representations, a critical area for substantial improvement remains—precise color understanding. In this paper, we address this limitation by introducing PRISM, a simple yet highly effective method that extends CLIP’s capability to grasp the nuances of precise colors. PRISM seamlessly adapts to both recognized HTML colors and out-of-vocabulary RGB inputs through the utilization of our curated dataset of 100 image-text pairs, which can be effortlessly repurposed for fine-tuning with any desired color. Importantly, PRISM achieves these enhancements without compromising CLIP’s performance on established benchmarks. Furthermore, we introduce a novel evaluation framework, ColorLens, featuring both seen and unseen test sets that can be readily repurposed to assess a model’s precision in understanding precise colors. Our comprehensive evaluation and results demonstrate significant improvements over baseline models.
2023
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Zhiwei Jia
|
Pradyumna Narayana
|
Arjun Akula
|
Garima Pruthi
|
Hao Su
|
Sugato Basu
|
Varun Jampani
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively under-explored, especially in the era of foundational vision-language models (VLMs) featuring impressive generalizability and adaptability. In this paper, we perform the first empirical study of image ad understanding through the lens of pre-trained VLMs. We benchmark and reveal practical challenges in adapting these VLMs to image ad understanding. We propose a simple feature adaptation strategy to effectively fuse multimodal information for image ads and further empower it with knowledge of real-world entities. We hope our study draws more attention to image ad understanding which is broadly relevant to the advertising industry.
Search
Co-authors
- Pradyumna Narayana 2
- Sugato Basu 2
- Varun Jampani 2
- Arjun Reddy Akula 1
- Inderjit S Dhillon 1
- show all...