Yuyan Chen


2023

pdf bib
MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization
Yuyan Chen | Zhihao Wen | Ge Fan | Zhengyu Chen | Wei Wu | Dayiheng Liu | Zhixu Li | Bang Liu | Yanghua Xiao
Findings of the Association for Computational Linguistics: EMNLP 2023

Prompt engineering, as an efficient and effective way to leverage Large Language Models (LLM), has drawn a lot of attention from the research community. The existing research primarily emphasizes the importance of adapting prompts to specific tasks, rather than specific LLMs. However, a good prompt is not solely defined by its wording, but also binds to the nature of the LLM in question. In this work, we first quantitatively demonstrate that different prompts should be adapted to different LLMs to enhance their capabilities across various downstream tasks in NLP. Then we novelly propose a model-adaptive prompt optimizer (MAPO) method that optimizes the original prompts for each specific LLM in downstream tasks. Extensive experiments indicate that the proposed method can effectively refine prompts for an LLM, leading to significant improvements over various downstream tasks.

2022

pdf bib
Efficient Two-Stage Progressive Quantization of BERT
Charles Le | Arash Ardakani | Amir Ardakani | Hang Zhang | Yuyan Chen | James Clark | Brett Meyer | Warren Gross
Proceedings of The Third Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)

The success of large BERT models has raised the demand for model compression methods to reduce model size and computational cost. Quantization can reduce the model size and inference latency, making inference more efficient, without changing its stucture, but it comes at the cost of performance degradation. Due to the complex loss landscape of ternarized/binarized BERT, we present an efficient two-stage progressive quantization method in which we fine tune the model with quantized weights and progressively lower its bits, and then we fine tune the model with quantized weights and activations. At the same time, we strategically choose which bitwidth to fine-tune on and to initialize from, and which bitwidth to fine-tune under augmented data to outperform the existing BERT binarization methods without adding an extra module, compressing the binary model 18% more than previous binarization methods or compressing BERT by 31x w.r.t. to the full-precision model. Our method without data augmentation can outperform existing BERT ternarization methods.