Ping Luo


2023

pdf bib
Guideline Learning for In-Context Information Extraction
Chaoxu Pang | Yixuan Cao | Qiang Ding | Ping Luo
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) can perform a new task by merely conditioning on task instructions and a few input-output examples, without optimizing any parameters. This is called In-Context Learning (ICL). In-context Information Extraction (IE) has recently garnered attention in the research community. However, the performance of In-context IE generally lags behind the state-of-the-art supervised expert models. We highlight a key reason for this shortfall: underspecified task description. The limited-length context struggles to thoroughly express the intricate IE task instructions and various edge cases, leading to misalignment in task comprehension with humans. In this paper, we propose a Guideline Learning (GL) framework for In-context IE which reflectively learns and follows guidelines. During the learning phrase, GL automatically synthesizes a set of guidelines based on a few error cases, and during inference, GL retrieves helpful guidelines for better ICL. Moreover, we propose a self-consistency-based active learning method to enhance the efficiency of GL. Experiments on event extraction and relation extraction show that GL can significantly improve the performance of in-context IE.

pdf bib
DSP: Discriminative Soft Prompts for Zero-Shot Entity and Relation Extraction
Bo Lv | Xin Liu | Shaojie Dai | Nayu Liu | Fan Yang | Ping Luo | Yue Yu
Findings of the Association for Computational Linguistics: ACL 2023

Prompt-based methods have shown their efficacy in transferring general knowledge within pre-trained language models (PLMs) for low-resource scenarios. Typically, prompt-based methods convert downstream tasks to cloze-style problems and map all labels to verbalizers.However, when applied to zero-shot entity and relation extraction, vanilla prompt-based methods may struggle with the limited coverage of verbalizers to labels and the slow inference speed. In this work, we propose a novel Discriminate Soft Prompts (DSP) approach to take advantage of the prompt-based methods to strengthen the transmission of general knowledge. Specifically, we develop a discriminative prompt method, which reformulates zero-shot tasks into token discrimination tasks without having to construct verbalizers.Furthermore, to improve the inference speed of the prompt-based methods, we design a soft prompt co-reference strategy, which leverages soft prompts to approximately refer to the vector representation of text tokens. The experimental results show that, our model outperforms baselines on two zero-shot entity recognition datasets with higher inference speed, and obtains a 7.5% average relation F1-score improvement over previous state-of-the-art models on Wiki-ZSL and FewRel.

pdf bib
Structured Pruning for Efficient Generative Pre-trained Language Models
Chaofan Tao | Lu Hou | Haoli Bai | Jiansheng Wei | Xin Jiang | Qun Liu | Ping Luo | Ngai Wong
Findings of the Association for Computational Linguistics: ACL 2023

The increasing sizes of large generative Pre-trained Language Models (PLMs) hinder their deploymentin real-world applications. To obtain efficient PLMs, previous studies mostly focus on pruning the attention heads and feed-forward networks (FFNs) of the Transformer. Nevertheless, we find that in generative PLMs, the hidden dimension shared by many other modules (e.g., embedding layer and layer normalization) contains persistent outliers regardless of the network input. This study comprehensively investigates the structured pruning of generative PLMs with all the above compressible components. To identify redundant network structures, we assign learnable masks over compressible components followed by sparse training. Various sizes of PLMs can be flexibly extracted via different thresholds, and are then task-specifically fine-tuned for further improvement. Extensive experiments on language modeling, summarization and machine translation validate the effectiveness of the proposed method. For example, the pruned BART brings 1.51x/6.96x inference speedup on GPU/CPU with 67% size reduction, and can be further combined with quantization for more than 25× compression.

2022

pdf bib
Compression of Generative Pre-trained Language Models via Quantization
Chaofan Tao | Lu Hou | Wei Zhang | Lifeng Shang | Xin Jiang | Qun Liu | Ping Luo | Ngai Wong
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The increasing size of generative Pre-trained Language Models (PLMs) have greatly increased the demand for model compression. Despite various methods to compress BERT or its variants, there are few attempts to compress generative PLMs, and the underlying difficulty remains unclear. In this paper, we compress generative PLMs by quantization. We find that previous quantization methods fail on generative tasks due to the homogeneous word embeddings caused by reduced capacity and the varied distribution of weights. Correspondingly, we propose a token-level contrastive distillation to learn distinguishable word embeddings, and a module-wise dynamic scaling to make quantizers adaptive to different modules. Empirical results on various tasks show that our proposed method outperforms the state-of-the-art compression methods on generative PLMs by a clear margin. With comparable performance with the full-precision models, we achieve 14.4x and 13.4x compression rate on GPT-2 and BART, respectively.