Changhun Lee
2024
Towards Pareto-Efficient RLHF: Paying Attention to a Few High-Reward Samples with Reward Dropout
Changhun Lee
|
Chiehyeon Lim
Findings of the Association for Computational Linguistics: EMNLP 2024
Recently, leveraging reinforcement learning (RL) to fine-tune language models (LMs), known as reinforcement learning from human feedback (RLHF), has become an important research topic. However, there is still a lack of theoretical understanding of how RLHF works, the conditions under which it succeeds or fails, and whether it guarantees optimization of both likelihood 𝛽(⋅) and reward R(⋅) objectives. To address these issues, we consider RLHF as a bi-objective problem that has the nature of a Pareto optimization, present a Pareto improvement condition that is necessary to obtain Pareto-efficient policies, and propose a simple yet powerful method named reward dropout that guarantees a Pareto improvement. To demonstrate the performance of reward dropout, two benchmark datasets commonly used in text style transfer tasks were utilized in our study: sentiment and topic datasets sourced from Yelp and AG_News, respectively. Our experiments highlight that paying attention to a few samples with higher rewards leads to greater Pareto improvements regardless of model size. We also demonstrate that the effect of reward dropout is generalizable and most effective with non-pretrained target models, saving the effort of pretraining.
QEFT: Quantization for Efficient Fine-Tuning of LLMs
Changhun Lee
|
Jun-gyu Jin
|
YoungHyun Cho
|
Eunhyeok Park
Findings of the Association for Computational Linguistics: EMNLP 2024
With the rapid growth in the use of fine-tuning for large language models (LLMs), optimizing fine-tuning while keeping inference efficient has become highly important. However, this is a challenging task as it requires improvements in all aspects, including inference speed, fine-tuning speed, memory consumption, and, most importantly, model quality. Previous studies have attempted to achieve this by combining quantization with fine-tuning, but they have failed to enhance all four aspects simultaneously. In this study, we propose a new lightweight technique called Quantization for Efficient Fine-Tuning (QEFT). QEFT accelerates both inference and fine-tuning, is supported by robust theoretical foundations, offers high flexibility, and maintains good hardware compatibility. Our extensive experiments demonstrate that QEFT matches the quality and versatility of full-precision parameter-efficient fine-tuning, while using fewer resources. Our code is available at https://github.com/xvyaward/qeft.
Repurformer: Transformers for Repurposing-Aware Molecule Generation
Changhun Lee
|
Gyumin Lee
Proceedings of the 1st Workshop on Language + Molecules (L+M 2024)
Generating as diverse molecules as possible with desired properties is crucial for drug discovery research, which invokes many approaches based on deep generative models today. Despite recent advancements in these models, particularly in variational autoencoders (VAEs), generative adversarial networks (GANs), Transformers, and diffusion models, a significant challenge known as the sample bias problem remains. This problem occurs when generated molecules targeting the same protein tend to be structurally similar, reducing the diversity of generation. To address this, we propose leveraging multi-hop relationships among proteins and compounds. Our model, Repurformer, integrates bi-directional pretraining with Fast Fourier Transform (FFT) and low-pass filtering (LPF) to capture complex interactions and generate diverse molecules. A series of experiments on BindingDB dataset confirm that Repurformer successfully creates substitutes for anchor compounds that resemble positive compounds, increasing diversity between the anchor and generated compounds.