PRISM: A New Lens for Improved Color Understanding

Arjun Reddy Akula; Garima Pruthi; Inderjit S Dhillon; Pradyumna Narayana; Sugato Basu; Varun Jampani

doi:10.18653/v1/2024.emnlp-industry.121

PRISM: A New Lens for Improved Color Understanding

Arjun Reddy Akula, Garima Pruthi, Inderjit S Dhillon, Pradyumna Narayana, Sugato Basu, Varun Jampani

Abstract

While image-text pre-trained models, such as CLIP, have demonstrated impressive capabilities in learning robust text and image representations, a critical area for substantial improvement remains—precise color understanding. In this paper, we address this limitation by introducing PRISM, a simple yet highly effective method that extends CLIP’s capability to grasp the nuances of precise colors. PRISM seamlessly adapts to both recognized HTML colors and out-of-vocabulary RGB inputs through the utilization of our curated dataset of 100 image-text pairs, which can be effortlessly repurposed for fine-tuning with any desired color. Importantly, PRISM achieves these enhancements without compromising CLIP’s performance on established benchmarks. Furthermore, we introduce a novel evaluation framework, ColorLens, featuring both seen and unseen test sets that can be readily repurposed to assess a model’s precision in understanding precise colors. Our comprehensive evaluation and results demonstrate significant improvements over baseline models.

Anthology ID:: 2024.emnlp-industry.121
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2024
Address:: Miami, Florida, US
Editors:: Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1659–1670
Language:
URL:: https://aclanthology.org/2024.emnlp-industry.121/
DOI:: 10.18653/v1/2024.emnlp-industry.121
Bibkey:
Cite (ACL):: Arjun Reddy Akula, Garima Pruthi, Inderjit S Dhillon, Pradyumna Narayana, Sugato Basu, and Varun Jampani. 2024. PRISM: A New Lens for Improved Color Understanding. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1659–1670, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):: PRISM: A New Lens for Improved Color Understanding (Akula et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-industry.121.pdf

PDF Cite Search Fix data