2024
pdf
bib
abs
LLaMA Pro: Progressive LLaMA with Block Expansion
Chengyue Wu
|
Yukang Gan
|
Yixiao Ge
|
Zeyu Lu
|
Jiahao Wang
|
Ye Feng
|
Ying Shan
|
Ping Luo
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model’s knowledge while mitigating forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro - Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.
2023
pdf
bib
abs
DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization
SongYang Gao
|
Shihan Dou
|
Yan Liu
|
Xiao Wang
|
Qi Zhang
|
Zhongyu Wei
|
Jin Ma
|
Ying Shan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Adversarial training is one of the best-performing methods in improving the robustness of deep language models. However, robust models come at the cost of high time consumption, as they require multi-step gradient ascents or word substitutions to obtain adversarial samples. In addition, these generated samples are deficient in grammatical quality and semantic consistency, which impairs the effectiveness of adversarial training. To address these problems, we introduce a novel, effective procedure for instead adversarial training with only clean data. Our procedure, distribution shift risk minimization (DSRM), estimates the adversarial loss by perturbing the input data’s probability distribution rather than their embeddings. This formulation results in a robust model that minimizes the expected global loss under adversarial attacks. Our approach requires zero adversarial samples for training and reduces time consumption by up to 70% compared to current best-performing adversarial training methods. Experiments demonstrate that DSRM considerably improves BERT’s resistance to textual adversarial attacks and achieves state-of-the-art robust accuracy on various benchmarks.
pdf
bib
abs
A Confidence-based Partial Label Learning Model for Crowd-Annotated Named Entity Recognition
Limao Xiong
|
Jie Zhou
|
Qunxi Zhu
|
Xiao Wang
|
Yuanbin Wu
|
Qi Zhang
|
Tao Gui
|
Xuanjing Huang
|
Jin Ma
|
Ying Shan
Findings of the Association for Computational Linguistics: ACL 2023
Existing models for named entity recognition (NER) are mainly based on large-scale labeled datasets, which always obtain using crowdsourcing. However, it is hard to obtain a unified and correct label via majority voting from multiple annotators for NER due to the large labeling space and complexity of this task. To address this problem, we aim to utilize the original multi-annotator labels directly. Particularly, we propose a CONfidence-based partial Label Learning (CONLL) method to integrate the prior confidence (given by annotators) and posterior confidences (learned by models) for crowd-annotated NER. This model learns a token- and content-dependent confidence via an Expectation–Maximization (EM) algorithm by minimizing empirical risk. The true posterior estimator and confidence estimator perform iteratively to update the true posterior and confidence respectively. We conduct extensive experimental results on both real-world and synthetic datasets, which show that our model can improve performance effectively compared with strong baselines.
pdf
bib
abs
Characterizing the Impacts of Instances on Robustness
Rui Zheng
|
Zhiheng Xi
|
Qin Liu
|
Wenbin Lai
|
Tao Gui
|
Qi Zhang
|
Xuanjing Huang
|
Jin Ma
|
Ying Shan
|
Weifeng Ge
Findings of the Association for Computational Linguistics: ACL 2023
Building robust deep neural networks (DNNs) against adversarial attacks is an important but challenging task. Previous defense approaches mainly focus on developing new model structures or training algorithms, but they do little to tap the potential of training instances, especially instances with robust patterns carring innate robustness. In this paper, we show that robust and non-robust instances in the training dataset, though are both important for test performance, have contrary impacts on robustness, which makes it possible to build a highly robust model by leveraging the training dataset in a more effective way. We propose a new method that can distinguish between robust instances from non-robust ones according to the model’s sensitivity to perturbations on individual instances during training. Surprisingly, we find that the model under standard training easily overfits the robust instances by relying on their simple patterns before the model completely learns their robust features. Finally, we propose a new mitigation algorithm to further release the potential of robust instances. Experimental results show that proper use of robust instances in the original dataset is a new line to achieve highly robust models.
pdf
bib
abs
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection
SongYang Gao
|
Shihan Dou
|
Qi Zhang
|
Xuanjing Huang
|
Jin Ma
|
Ying Shan
Findings of the Association for Computational Linguistics: ACL 2023
Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications. However, existing adversarial detection methods require access to sufficient training data, which brings noteworthy concerns regarding privacy leakage and generalizability. In this work, we validate that the adversarial sample generated by attack algorithms is strongly related to a specific vector in the high-dimensional inputs. Such vectors, namely UAPs (Universal Adversarial Perturbations), can be calculated without original training data. Based on this discovery, we propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks, and maintains an equivalent time consumption to normal inference.