Junmo Kim


2024

pdf bib
StablePrompt : Automatic Prompt Tuning using Reinforcement Learning for Large Language Model
Minchan Kwon | Gaeun Kim | Jongsuk Kim | Haeil Lee | Junmo Kim
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Finding appropriate prompts for the specific task has become an important issue as the usage of Large Language Models (LLM) have expanded. However, the variety of input-output formats complicate finding the prompts. Reinforcement Learning (RL) is a promising for prompt tuning due to its ability to incrementally produce better results through interaction with the environment. But its inherent training instability and environmental dependency make it difficult to use in practice. In this paper, we propose StablePrompt, a prompt tuning method based on RL. We formulate prompt tuning as RL problem between agent and target LLM, and introduce Adaptive Proximal Policy Optimization (APPO), an modified version of PPO for prompt tuning. APPO introduces an anchor model and updates it adaptively based on the training trajectory. Using this anchor model for the KL divergence term in PPO keeps the search space flexible and ensures training stability. We evaluate StablePrompt on various tasks, including text classification, question answering, and text generation. StablePrompt achieves State-of-The-Art performance across diverse tasks. We demonstrates that StablePrompt performs well across various types and sizes of LLMs. Furthermore, we present TTE-StablePrompt, an extension for generating input-dependent prompts. It outperforms StablePrompt in tasks that are hard to solve with a single prompt. This shows that StablePrompt is an extensible and stable RL framework for LLM.

pdf bib
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
Youngtaek Oh | Jae Won Cho | Dong-Jin Kim | In So Kweon | Junmo Kim
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In this paper, we propose a new method to enhance compositional understanding in pre-trained vision and language models (VLMs) without sacrificing performance in zero-shot multi-modal tasks. Traditional fine-tuning approaches often improve compositional reasoning at the cost of degrading multi-modal capabilities, primarily due to the use of global hard negative (HN) loss, which contrasts global representations of images and texts. This global HN loss pushes HN texts that are highly similar to the original ones, damaging the model’s multi-modal representations. To overcome this limitation, we propose Fine-grained Selective Calibrated CLIP (FSC-CLIP), which integrates local hard negative loss and selective calibrated regularization. These innovations provide fine-grained negative supervision while preserving the model’s representational integrity. Our extensive evaluations across diverse benchmarks for both compositionality and multi-modal tasks show that FSC-CLIP not only achieves compositionality on par with state-of-the-art models but also retains strong multi-modal capabilities. Code is available at: https://github.com/ytaek-oh/fsc-clip.