Junjie Shan
2024
StepCoder: Improving Code Generation with Reinforcement Learning from Compiler Feedback
Shihan Dou
|
Yan Liu
|
Haoxiang Jia
|
Enyu Zhou
|
Limao Xiong
|
Junjie Shan
|
Caishuang Huang
|
Xiao Wang
|
Xiaoran Fan
|
Zhiheng Xi
|
Yuhao Zhou
|
Tao Ji
|
Rui Zheng
|
Qi Zhang
|
Tao Gui
|
Xuanjing Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code generation quality. However, the lengthy code generated by LLMs in response to complex human requirements makes RL exploration a challenge. Also, since the unit tests may not cover the complicated code, optimizing LLMs by using these unexecuted code snippets is ineffective. To tackle these challenges, we introduce StepCoder, a novel RL framework for code generation, consisting of two main components: CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks, while FGO only optimizes the model by masking the unexecuted code segments to provide Fine-Grained Optimization. In addition, we furthermore construct the APPS+ dataset for RL training, which is manually verified to ensure the correctness of unit tests. Experimental results show that our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks. The code and dataset will be made available upon publication.
2022
Decorrelate Irrelevant, Purify Relevant: Overcome Textual Spurious Correlations from a Feature Perspective
Shihan Dou
|
Rui Zheng
|
Ting Wu
|
SongYang Gao
|
Junjie Shan
|
Qi Zhang
|
Yueming Wu
|
Xuanjing Huang
Proceedings of the 29th International Conference on Computational Linguistics
Natural language understanding (NLU) models tend to rely on spurious correlations (i.e., dataset bias) to achieve high performance on in-distribution datasets but poor performance on out-of-distribution ones. Most of the existing debiasing methods often identify and weaken these samples with biased features (i.e., superficial surface features that cause such spurious correlations). However, down-weighting these samples obstructs the model in learning from the non-biased parts of these samples. To tackle this challenge, in this paper, we propose to eliminate spurious correlations in a fine-grained manner from a feature space perspective. Specifically, we introduce Random Fourier Features and weighted re-sampling to decorrelate the dependencies between features to mitigate spurious correlations. After obtaining decorrelated features, we further design a mutual-information-based method to purify them, which forces the model to learn features that are more relevant to tasks. Extensive experiments on two well-studied NLU tasks demonstrate that our method is superior to other comparative approaches.
Search
Co-authors
- Shihan Dou 2
- Rui Zheng 2
- Qi Zhang 2
- Xuan-Jing Huang 2
- Yan Liu 1
- show all...