Chongming Gao
2024
Dual-Phase Accelerated Prompt Optimization
Muchen Yang
|
Moxin Li
|
Yongle Li
|
Zijun Chen
|
Chongming Gao
|
Junqi Zhang
|
Yangyang Li
|
Fuli Feng
Findings of the Association for Computational Linguistics: EMNLP 2024
Gradient-free prompt optimization methods have made significant strides in enhancing the performance of closed-source Large Language Model (LLMs) across a wide range of tasks. However, existing approaches make light of the importance of high-quality prompt initialization and the identification of effective optimization directions, thus resulting in substantial optimization steps to obtain satisfactory performance. In this light, we aim to accelerate prompt optimization process to tackle the challenge of low convergence rate. We propose a dual-phase approach which starts with generating high-quality initial prompts by adopting a well-designed meta-instruction to delve into task-specific information, and iteratively optimize the prompts at the sentence level, leveraging previous tuning experience to expand prompt candidates and accept effective ones. Extensive experiments on eight datasets demonstrate the effectiveness of our proposed method, achieving a consistent accuracy gain over baselines with less than five optimization steps.
2020
Revisiting Representation Degeneration Problem in Language Modeling
Zhong Zhang
|
Chongming Gao
|
Cong Xu
|
Rui Miao
|
Qinli Yang
|
Junming Shao
Findings of the Association for Computational Linguistics: EMNLP 2020
Weight tying is now a common setting in many language generation tasks such as language modeling and machine translation. However, a recent study reveals that there is a potential flaw in weight tying. They find that the learned word embeddings are likely to degenerate and lie in a narrow cone when training a language model. They call it the representation degeneration problem and propose a cosine regularization to solve it. Nevertheless, we prove that the cosine regularization is insufficient to solve the problem, as the degeneration is still likely to happen under certain conditions. In this paper, we revisit the representation degeneration problem and theoretically analyze the limitations of the previously proposed solution. Afterward, we propose an alternative regularization method called Laplacian regularization to tackle the problem. Experiments on language modeling demonstrate the effectiveness of the proposed Laplacian regularization.