Zhiguo Yu
2023
Pre-trained Language Models Can be Fully Zero-Shot Learners
Xuandong Zhao
|
Siqi Ouyang
|
Zhiguo Yu
|
Ming Wu
|
Lei Li
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
How can we extend a pre-trained model to many language understanding tasks, without labeled or additional unlabeled data? Pre-trained language models (PLMs) have been effective for a wide range of NLP tasks. However, existing approaches either require fine-tuning on downstream labeled datasets or manually constructing proper prompts. In this paper, we propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding. Unlike previous methods, NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning, nor does it rely on humans to construct a comprehensive set of prompt label words. We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks: including text classification, text entailment, similar text retrieval, paraphrasing, and multiple-choice question answering. Experimental results demonstrate that our NPPrompt outperforms the previous best fully zero-shot method by big margins, with absolute gains of 12.8% in accuracy on text classification and 15.6% on the GLUE benchmark. Our source code is available at https://anonymous.4open.science/r/NPPrompt.
2022
Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation
Xuandong Zhao
|
Zhiguo Yu
|
Ming Wu
|
Lei Li
Findings of the Association for Computational Linguistics: ACL 2022
How to learn highly compact yet effective sentence representation? Pre-trained language models have been effective in many NLP tasks. However, these models are often huge and produce large sentence embeddings. Moreover, there is a big performance gap between large and small models. In this paper, we propose Homomorphic Projective Distillation (HPD) to learn compressed sentence embeddings. Our method augments a small Transformer encoder model with learnable projection layers to produce compact representations while mimicking a large pre-trained language model to retain the sentence representation quality. We evaluate our method with different model sizes on both semantic textual similarity (STS) and semantic retrieval (SR) tasks. Experiments show that our method achieves 2.7-4.5 points performance gain on STS tasks compared with previous best representations of the same size. In SR tasks, our method improves retrieval speed (8.2×) and memory usage (8.0×) compared with state-of-the-art large models. Our implementation is available at https://github.com/XuandongZhao/HPD.
2016
Retrofitting Word Vectors of MeSH Terms to Improve Semantic Similarity Measures
Zhiguo Yu
|
Trevor Cohen
|
Byron Wallace
|
Elmer Bernstam
|
Todd Johnson
Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis