Yi Dong


2024

pdf bib
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
Zhilin Wang | Yi Dong | Jiaqi Zeng | Virginia Adams | Makesh Narsimhan Sreedhar | Daniel Egert | Olivier Delalleau | Jane Scowcroft | Neel Kant | Aidan Swope | Oleksii Kuchaiev
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Existing open-source helpfulness preference datasets do not specify what makes some responses more helpful and others less so. Models trained on these datasets can incidentally learn to model dataset artifacts (e.g. preferring longer but unhelpful responses only due to their length). To alleviate this problem, we collect HelpSteer, a multi-attribute helpfulness dataset annotated for the various aspects that make responses helpful. Specifically, our 37k-sample dataset has annotations for correctness, coherence, complexity, and verbosity in addition to overall helpfulness of responses. Training Llama 2 70B using the HelpSteer dataset with SteerLM technique produces a model that scores 7.54 on MT Bench, which is currently the highest score for open models that do not require training data from more powerful models (e.g. GPT-4). We release this dataset with CC-BY-4.0 license at https://huggingface.co/datasets/nvidia/HelpSteer

2023

pdf bib
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
Yi Dong | Zhilin Wang | Makesh Sreedhar | Xianchao Wu | Oleksii Kuchaiev
Findings of the Association for Computational Linguistics: EMNLP 2023

Model alignment with human preferences is an essential step in making Large Language Models (LLMs) helpful and consistent with human values. It typically consists of supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) stages. However, RLHF faces inherent limitations stemming from a complex training setup and its tendency to align the model with implicit values that end users cannot control at run-time. Moreover, reward models in RLHF stage commonly rely on single-dimensional feedback as opposed to explicit, multifaceted signals that indicate attributes such as helpfulness, humor, and toxicity. To address these limitations, we propose SteerLM, a supervised fine-tuning method that empowers end-users to control responses during inference. SteerLM conditions responses to conform to an explicitly defined multi-dimensional set of attributes, thereby empowering a steerable AI capable of generating helpful and high-quality responses while maintaining customizability. Experiments show that SteerLM trained on open source datasets generates responses that are preferred by human and automatic evaluators to many state-of-the-art baselines trained with RLHF while being much easier to train. Try SteerLM at https://huggingface.co/nvidia/SteerLM-llama2-13B

pdf bib
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
Boxin Wang | Wei Ping | Peng Xu | Lawrence McAfee | Zihan Liu | Mohammad Shoeybi | Yi Dong | Oleksii Kuchaiev | Bo Li | Chaowei Xiao | Anima Anandkumar | Bryan Catanzaro
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Large decoder-only language models (LMs) can be largely improved in terms of perplexity by retrieval (e.g., RETRO), but its impact on text generation quality and downstream task accuracy is unclear. Thus, it is still an open question: shall we pretrain large autoregressive LMs with retrieval? To answer it, we perform a comprehensive study on a scalable pre-trained retrieval-augmented LM (i.e., RETRO) compared with standard GPT and retrieval-augmented GPT incorporated at fine-tuning or inference stages. We first provide the recipe to reproduce RETRO up to 9.5B parameters while retrieving a text corpus with 330B tokens. Based on that, we have the following novel findings: i) RETRO outperforms GPT on text generation with much less degeneration (i.e., repetition), moderately higher factual accuracy, and slightly lower toxicity with a nontoxic retrieval database. ii) On the LM Evaluation Harness benchmark, RETRO largely outperforms GPT on knowledge-intensive tasks, but is on par with GPT on other tasks. Furthermore, we introduce a simple variant of the model, RETRO++, which largely improves open-domain QA results of original RETRO (e.g., EM score +8.6 on Natural Question) and significantly outperforms retrieval-augmented GPT across different model sizes. Our findings highlight the promising direction of pretraining autoregressive LMs with retrieval as future foundation models. We release our implementation at: https://github.com/NVIDIA/Megatron-LM/tree/main/tools/retro.

2022

pdf bib
Proceedings of the Second Workshop on When Creative AI Meets Conversational AI
Xianchao Wu | Peiying Ruan | Sheng Li | Yi Dong
Proceedings of the Second Workshop on When Creative AI Meets Conversational AI