Daixuan Cheng


2023

pdf bib
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
Daixuan Cheng | Shaohan Huang | Junyu Bi | Yuefeng Zhan | Jianfeng Liu | Yujing Wang | Hao Sun | Furu Wei | Weiwei Deng | Qi Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) are popular for their impressive abilities, but the need for model-specific fine-tuning or task-specific prompt engineering can hinder their generalization. We propose UPRISE (Universal Prompt Retrieval for Improving zero-Shot Evaluation), which tunes a lightweight and versatile retriever that automatically retrieves prompts for a given zero-shot task input. Specifically, we demonstrate universality in a cross-task and cross-model scenario: the retriever is tuned on diverse tasks, but tested on unseen task types; we use a small frozen LLM, GPT-Neo-2.7B, for tuning the retriever, but test the retriever on different LLMs of much larger scales, such as BLOOM-7.1B, OPT-66B and GPT3-175B. Additionally, we show that UPRISE mitigates the hallucination problem in our experiments with ChatGPT, suggesting its potential to improve even the strongest LLMs. Our model and code are available at https://github.com/microsoft/LMOps.

2022

pdf bib
Snapshot-Guided Domain Adaptation for ELECTRA
Daixuan Cheng | Shaohan Huang | Jianfeng Liu | Yuefeng Zhan | Hao Sun | Furu Wei | Denvy Deng | Qi Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022

Discriminative pre-trained language models, such as ELECTRA, have achieved promising performances in a variety of general tasks. However, these generic pre-trained models struggle to capture domain-specific knowledge of domain-related tasks. In this work, we propose a novel domain-adaptation method for ELECTRA, which can dynamically select domain-specific tokens and guide the discriminator to emphasize them, without introducing new training parameters. We show that by re-weighting the losses of domain-specific tokens, ELECTRA can be effectively adapted to different domains. The experimental results in both computer science and biomedical domains show that the proposed method can achieve state-of-the-art results on the domain-related tasks.