Zeyd Boukhers
2025
EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning
Lingxiao Kong
|
Cong Yang
|
Susanne Neufang
|
Oya Deniz Beyan
|
Zeyd Boukhers
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Recent advances in reinforcement learning (RL) for large language model (LLM) fine-tuning show promise in addressing multi-objective tasks but still face significant challenges, including competing objective balancing, low training efficiency, poor scalability, and limited explainability. Leveraging ensemble learning principles, we introduce an Ensemble Multi-Objective RL (EMORL) framework that fine-tunes multiple models with individual objectives while optimizing their aggregation after the fine-tuning to improve efficiency and flexibility. Our method is the first to aggregate the hidden states of individual models, incorporating contextual information from multiple objectives. This approach is supported by a hierarchical grid search algorithm that identifies optimal weighted combinations. We evaluate EMORL on counselor reflection generation tasks, using text classification models to score the generations and provide rewards during RL fine-tuning. Through comprehensive experiments on the PAIR and Psych8k datasets, we demonstrate the advantages of EMORL against existing baselines: significantly lower and more stable training consumption (17,529 ± 1,650 data points and 6,573 ± 147.43 seconds), improved scalability and explainability, and comparable performance across multiple objectives.
Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks
Joyeeta Datta
|
Niclas Doll
|
Qusai Ramadan
|
Zeyd Boukhers
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Large Language Models (LLMs) have shown outstanding performance across a range of NLP tasks, but their computational demands hinder deployment in real-world, resource-constrained environments. This work investigates the extent to which LLMs can be compressed using knowledge distillation (KD) while maintaining strong performance on question answering (QA) tasks. We evaluate student models distilled from the Pythia and Qwen2.5 families on two QA benchmarks, SQuAD and MLQA, under zero-shot and one-shot prompting conditions. Results show that student models retain over 90% of their teacher models’ performance while reducing parameter counts by up to 57.1%. Furthermore, one-shot prompting yields additional performance gains over zero-shot setups for both model families. These findings underscore the trade-off between model efficiency and task performance, demonstrating that KD, combined with minimal prompting, can yield compact yet capable QA systems suitable for real-world applications.
Search
Fix author
Co-authors
- Oya Deniz Beyan 1
- Joyeeta Datta 1
- Niclas Doll 1
- Lingxiao Kong 1
- Susanne Neufang 1
- show all...