Jian Hu
2025
OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework
Jian Hu | Xibin Wu | Wei Shen | Jason Klein Liu | Weixun Wang | Songlin Jiang | Haoran Wang | Hao Chen | Bin Chen | Wenkai Fang | Xianyu | Yu Cao | Haotian Xu | Yiming Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Jian Hu | Xibin Wu | Wei Shen | Jason Klein Liu | Weixun Wang | Songlin Jiang | Haoran Wang | Hao Chen | Bin Chen | Wenkai Fang | Xianyu | Yu Cao | Haotian Xu | Yiming Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Large Language Models (LLMs) fine-tuned via Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) significantly improve the alignment of human-AI values and further raise the upper bound of AI capabilities, particularly in reasoning-intensive, long-context Chain-of-Thought (long-CoT) tasks. However, existing RLHF (or RLVR) frameworks commonly face challenges such as inference bottlenecks and complexity barriers, restricting their accessibility for newcomers. To bridge this gap, we introduce OpenRLHF, a user-friendly, scalable, and easy-to-learn open-source RLHF framework built upon Ray, vLLM, DeepSpeed, and HuggingFace Transformers, featuring a simplified design, clear code structure, and comprehensive documentation to facilitate entry for researchers and practitioners. Experimental results show that OpenRLHF achieves superior training efficiency with speedups ranging from 1.22× to 1.68× across different model sizes compared to state-of-the-art frameworks, while requiring significantly fewer lines of code for implementation. OpenRLHF is publicly available at https://github.com/OpenRLHF/OpenRLHF, and has already been adopted by leading institutions to accelerate RLHF research and learning.
基于自监督表征蒸馏的Whisper低资源语音识别优化方法
Jian Hu | Ling Dong | Wenjun Wang | Yan Xiang | Shengxiang Gao | Zhengtao Yu
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Jian Hu | Ling Dong | Wenjun Wang | Yan Xiang | Shengxiang Gao | Zhengtao Yu
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"Whisper是一种强大的多语言语音识别模型,在英语等高资源语言上表现优异,但在缅甸语等部分低资源语言的性能仍受限于预训练数据的不足。为此,本文提出了一种基于自监督表征蒸馏的Whisper低资源语音识别优化方法。通过跨模型表征蒸馏机制,实现自监督模型表征向Whisper编码器的知识迁移,提升对缅甸语等语言的表征建模能力。实验结果表明,该方法在缅甸语、柬埔寨语、乌兹别克语和旁遮普语ASR任务中有效降低了字符错误率,验证了所提方法的有效性。"