Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning

Sai Ashish Somayajula; Bokai Hu; Qi Cao; Xin Pan; Pengtao Xie

doi:10.18653/v1/2025.findings-emnlp.1392

Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning

Sai Ashish Somayajula, Bokai Hu, Qi Cao, Xin Pan, Pengtao Xie

Abstract

Instruction-fine-tuned large language models (LLMs) under 14B parameters continue to underperform on natural language understanding (NLU) tasks, often trailing smaller models like BERT-base on benchmarks such as GLUE and SuperGLUE. Motivated by the success of reinforcement learning in reasoning tasks (e.g., DeepSeek), we explore Proximal Policy Optimization (PPO) as a framework to improve the NLU capabilities of LLMs. We frame NLU as a reinforcement learning environment, treating token generation as a sequence of actions and optimizing for reward signals based on alignment with ground-truth labels. PPO consistently outperforms supervised fine-tuning, yielding an average improvement of 6.3 points on GLUE, and surpasses zero-shot and few-shot prompting by 38.7 and 26.1 points, respectively. Notably, PPO-tuned models outperform GPT-4o by over 4% on average across sentiment and natural language inference tasks, including gains of 7.3% on the Mental Health dataset and 10.9% on SIGA-nli. This work highlights a promising direction for adapting LLMs to new tasks by reframing them as reinforcement learning problems, enabling learning through simple end-task rewards rather than extensive data curation. Our code is available at https://github.com/coder-qicao/RL4GLUE.

Anthology ID:: 2025.findings-emnlp.1392
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25552–25567
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.1392/
DOI:: 10.18653/v1/2025.findings-emnlp.1392
Bibkey:
Cite (ACL):: Sai Ashish Somayajula, Bokai Hu, Qi Cao, Xin Pan, and Pengtao Xie. 2025. Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 25552–25567, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning (Somayajula et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.1392.pdf
Checklist:: 2025.findings-emnlp.1392.checklist.pdf

PDF Cite Search Checklist Fix data