Arjun Reddy Akula


2024

pdf bib
PRISM: A New Lens for Improved Color Understanding
Arjun Reddy Akula | Garima Pruthi | Inderjit S Dhillon | Pradyumna Narayana | Sugato Basu | Varun Jampani
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

While image-text pre-trained models, such as CLIP, have demonstrated impressive capabilities in learning robust text and image representations, a critical area for substantial improvement remains—precise color understanding. In this paper, we address this limitation by introducing PRISM, a simple yet highly effective method that extends CLIP’s capability to grasp the nuances of precise colors. PRISM seamlessly adapts to both recognized HTML colors and out-of-vocabulary RGB inputs through the utilization of our curated dataset of 100 image-text pairs, which can be effortlessly repurposed for fine-tuning with any desired color. Importantly, PRISM achieves these enhancements without compromising CLIP’s performance on established benchmarks. Furthermore, we introduce a novel evaluation framework, ColorLens, featuring both seen and unseen test sets that can be readily repurposed to assess a model’s precision in understanding precise colors. Our comprehensive evaluation and results demonstrate significant improvements over baseline models.

pdf bib
MLT-DR: Multi-Lingual/Task Demonstration RetrievalAn Attempt towards Generalized Retriever for In-Context Learning
Kazuma Hashimoto | Arjun Reddy Akula | Karthik Raman | Michael Bendersky
Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024)

This paper presents Multi-Lingual/Task Demonstration Retrieval (MLT-DR) for in-context learning with Large Language Models (LLMs).Our goal is to investigate how dense demonstration retrieval models are generalized across languages and tasks.We first convert 81 tasks into a common format, covering various languages, task types, and domains.For 8 English-based tasks among them, we use machine translation to create synthetic multi/cross-lingual tasks, by translating the examples into non-English languages to explicitly cover more than 130 languages.We then use an instruction-tuned LLM to estimate utility of demonstrations for all the tasks to train the demonstration retrieval models.In our experiments, we show an interesting counterintuitive observation; to compute embeddings of demonstrations, using both the input and ground-truth output hurts the generalization ability of the retriever on unseen tasks whose output space is quite different from those in the seen task set.We also examine that our retriever robustly works even with LLMs that we did not touch during the development of the models.The retrieval models’ checkpoints are publicly available at URL-available-upon-publication.