2024
pdf
bib
abs
Teaching Language Models to Self-Improve by Learning from Language Feedback
Chi Hu
|
Yimin Hu
|
Hang Cao
|
Tong Xiao
|
JingBo Zhu
Findings of the Association for Computational Linguistics: ACL 2024
Aligning Large Language Models (LLMs) with human intentions and values is crucial yet challenging. Current methods primarily rely on human preferences, which are costly and insufficient in capturing nuanced feedback expressed in natural language. In this paper, we present Self-Refinement Tuning (SRT), a method that leverages model feedback for alignment, thereby reducing reliance on human annotations. SRT uses a base language model (e.g., Tulu2) to generate initial responses, which are critiqued and refined by a more advanced model (e.g., GPT-4-Turbo). This process enables the base model to self-evaluate and improve its outputs, facilitating continuous learning. SRT further optimizes the model by learning from its self-generated feedback and refinements, creating a feedback loop that promotes model improvement. Our empirical evaluations demonstrate that SRT significantly outperforms strong baselines across diverse tasks and model sizes. When applied to a 70B parameter model, SRT increases the win rate from 9.6% to 25.8% on the AlpacaEval 2.0 benchmark, surpassing well-established systems such as GPT-4-0314, Claude 2, and Gemini. Our analysis highlights the crucial role of language feedback in the success of SRT, suggesting potential for further exploration in this direction.
2023
pdf
bib
abs
Sparse Frame Grouping Network with Action Centered for Untrimmed Video Paragraph Captioning
Guorui Yu
|
Yimin Hu
|
Yuejie Zhang
|
Rui Feng
|
Tao Zhang
|
Shang Gao
Findings of the Association for Computational Linguistics: EMNLP 2023
Generating paragraph captions for untrimmed videos without event annotations is challenging, especially when aiming to enhance precision and minimize repetition at the same time. To address this challenge, we propose a module called Sparse Frame Grouping (SFG). It dynamically groups event information with the help of action information for the entire video and excludes redundant frames within pre-defined clips. To enhance the performance, an Intra Contrastive Learning technique is designed to align the SFG module with the core event content in the paragraph, and an Inter Contrastive Learning technique is employed to learn action-guided context with reduced static noise simultaneously. Extensive experiments are conducted on two benchmark datasets (ActivityNet Captions and YouCook2). Results demonstrate that SFG outperforms the state-of-the-art methods on all metrics.
2022
pdf
bib
abs
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection
Chenglong Wang
|
Yi Lu
|
Yongyu Mu
|
Yimin Hu
|
Tong Xiao
|
Jingbo Zhu
Findings of the Association for Computational Linguistics: EMNLP 2022
Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model.In this process, we typically have multiple types of knowledge extracted from the teacher model.The problem is to make full use of them to train the student model.Our preliminary study shows that: (1) not all of the knowledge is necessary for learning a good student model, and (2) knowledge distillation can benefit from certain knowledge at different training steps.In response to these, we propose an actor-critic approach to selecting appropriate knowledge to transfer during the process of knowledge distillation.In addition, we offer a refinement of the training algorithm to ease the computational burden.Experimental results on the GLUE datasets show that our method outperforms several strong knowledge distillation baselines significantly.
pdf
bib
abs
The NiuTrans Machine Translation Systems for WMT22
Weiqiao Shan
|
Zhiquan Cao
|
Yuchen Han
|
Siming Wu
|
Yimin Hu
|
Jie Wang
|
Yi Zhang
|
Hou Baoyu
|
Hang Cao
|
Chenghao Gao
|
Xiaowen Liu
|
Tong Xiao
|
Anxiang Ma
|
Jingbo Zhu
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper describes the NiuTrans neural machine translation systems of the WMT22 General MT constrained task. We participate in four directions, including Chinese→English, English→Croatian, and Livonian↔English. Our models are based on several advanced Transformer variants, e.g., Transformer-ODE, Universal Multiscale Transformer (UMST). The main workflow consists of data filtering, large-scale data augmentation (i.e., iterative back-translation, iterative knowledge distillation), and specific-domain fine-tuning. Moreover, we try several multi-domain methods, such as a multi-domain model structure and a multi-domain data clustering method, to rise to this year’s newly proposed multi-domain test set challenge. For low-resource scenarios, we build a multi-language translation model to enhance the performance, and try to use the pre-trained language model (mBERT) to initialize the translation model.
2021
pdf
bib
abs
The NiuTrans System for the WMT 2021 Efficiency Task
Chenglong Wang
|
Chi Hu
|
Yongyu Mu
|
Zhongxiang Yan
|
Siming Wu
|
Yimin Hu
|
Hang Cao
|
Bei Li
|
Ye Lin
|
Tong Xiao
|
Jingbo Zhu
Proceedings of the Sixth Conference on Machine Translation
This paper describes the NiuTrans system for the WMT21 translation efficiency task. Following last year’s work, we explore various techniques to improve the efficiency while maintaining translation quality. We investigate the combinations of lightweight Transformer architectures and knowledge distillation strategies. Also, we improve the translation efficiency with graph optimization, low precision, dynamic batching, and parallel pre/post-processing. Putting these together, our system can translate 247,000 words per second on an NVIDIA A100, being 3× faster than our last year’s system. Our system is the fastest and has the lowest memory consumption on the GPU-throughput track. The code, model, and pipeline will be available at NiuTrans.NMT.