2024
pdf
bib
abs
MoE-I2: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
Cheng Yang
|
Yang Sui
|
Jinqi Xiao
|
Lingyi Huang
|
Yu Gong
|
Yuanlin Duan
|
Wenqi Jia
|
Miao Yin
|
Yu Cheng
|
Bo Yuan
Findings of the Association for Computational Linguistics: EMNLP 2024
The emergence of Mixture of Experts (MoE) LLMs has significantly advanced the development of language models. Compared to traditional LLMs, MoE LLMs outperform traditional LLMs by achieving higher performance with considerably fewer activated parameters. Despite this efficiency, their enormous parameter size still leads to high deployment costs. In this paper, we introduce a two-stage compression method tailored for MoE to reduce the model size and decrease the computational cost. First, in the inter-expert pruning stage, we analyze the importance of each layer and propose the Layer-wise Genetic Search and Block-wise KT-Reception Field with the non-uniform pruning ratio to prune the individual expert. Second, in the intra-expert decomposition stage, we apply the low-rank decomposition to further compress the parameters within the remaining experts. Extensive experiments on Qwen1.5-MoE-A2.7B, Deepseek-V2-Lite, and Mixtral-8×7B, demonstrate that our proposed methods can both reduce the model size and enhance inference efficiency while maintaining performance in various zero-shot tasks.
pdf
bib
abs
Hide and Seek in Noise Labels: Noise-Robust Collaborative Active Learning with LLMs-Powered Assistance
Bo Yuan
|
Yulin Chen
|
Yin Zhang
|
Wei Jiang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Learning from noisy labels (LNL) is a challenge that arises in many real-world scenarios where collected training data can contain incorrect or corrupted labels. Most existing solutions identify noisy labels and adopt active learning to query human experts on them for denoising. In the era of large language models (LLMs), although we can reduce the human effort to improve these methods, their performances are still subject to accurately separating the clean and noisy samples from noisy data. In this paper, we propose an innovative collaborative learning framework NoiseAL based on active learning to combine LLMs and small models (SMs) for learning from noisy labels. During collaborative training, we first adopt two SMs to form a co-prediction network and propose a dynamic-enhanced threshold strategy to divide the noisy data into different subsets, then select the clean and noisy samples from these subsets to feed the active annotator LLMs to rectify noisy samples. Finally, we employ different optimization objectives to conquer subsets with different degrees of label noises. Extensive experiments on synthetic and real-world noise datasets further demonstrate the superiority of our framework over state-of-the-art baselines.
2023
pdf
bib
abs
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework
Yangyi Chen
|
Hongcheng Gao
|
Ganqu Cui
|
Lifan Yuan
|
Dehan Kong
|
Hanlu Wu
|
Ning Shi
|
Bo Yuan
|
Longtao Huang
|
Hui Xue
|
Zhiyuan Liu
|
Maosong Sun
|
Heng Ji
Findings of the Association for Computational Linguistics: ACL 2023
Textual adversarial attacks can discover models’ weaknesses by adding semantic-preserved but misleading perturbations to the inputs. The long-lasting adversarial attack-and-defense arms race in Natural Language Processing (NLP) is algorithm-centric, providing valuable techniques for automatic robustness evaluation. However, the existing practice of robustness evaluation may exhibit issues of incomprehensive evaluation, impractical evaluation protocol, and invalid adversarial samples. In this paper, we aim to set up a unified automatic robustness evaluation framework, shifting towards model-centric evaluation to further exploit the advantages of adversarial attacks. To address the above challenges, we first determine robustness evaluation dimensions based on model capabilities and specify the reasonable algorithm to generate adversarial samples for each dimension. Then we establish the evaluation protocol, including evaluation settings and metrics, under realistic demands. Finally, we use the perturbation degree of adversarial samples to control the sample validity. We implement a toolkit RobTest that realizes our automatic robustness evaluation framework. In our experiments, we conduct a robustness evaluation of RoBERTa models to demonstrate the effectiveness of our evaluation framework, and further show the rationality of each component in the framework.
2022
pdf
bib
abs
Text Editing as Imitation Game
Ning Shi
|
Bin Tang
|
Bo Yuan
|
Longtao Huang
|
Yewen Pu
|
Jie Fu
|
Zhouhan Lin
Findings of the Association for Computational Linguistics: EMNLP 2022
Text editing, such as grammatical error correction, arises naturally from imperfect textual data. Recent works frame text editing as a multi-round sequence tagging task, where operations – such as insertion and substitution – are represented as a sequence of tags. While achieving good results, this encoding is limited in flexibility as all actions are bound to token-level tags. In this work, we reformulate text editing as an imitation game using behavioral cloning. Specifically, we convert conventional sequence-to-sequence data into state-to-action demonstrations, where the action space can be as flexible as needed. Instead of generating the actions one at a time, we introduce a dual decoders structure to parallel the decoding while retaining the dependencies between action tokens, coupled with trajectory augmentation to alleviate the distribution shift that imitation learning often suffers. In experiments on a suite of Arithmetic Equation benchmarks, our model consistently outperforms the autoregressive baselines in terms of performance, efficiency, and robustness. We hope our findings will shed light on future studies in reinforcement learning applying sequence-level action generation to natural language processing.
pdf
bib
abs
Syntax-guided Localized Self-attention by Constituency Syntactic Distance
Shengyuan Hou
|
Jushi Kai
|
Haotian Xue
|
Bingyu Zhu
|
Bo Yuan
|
Longtao Huang
|
Xinbing Wang
|
Zhouhan Lin
Findings of the Association for Computational Linguistics: EMNLP 2022
Recent works have revealed that Transformers are implicitly learning the syntactic information in its lower layers from data, albeit is highly dependent on the quality and scale of the training data. However, learning syntactic information from data is not necessary if we can leverage an external syntactic parser, which provides better parsing quality with well-defined syntactic structures. This could potentially improve Transformer’s performance and sample efficiency. In this work, we propose a syntax-guided localized self-attention for Transformer that allows directly incorporating grammar structures from an external constituency parser. It prohibits the attention mechanism to overweight the grammatically distant tokens over close ones. Experimental results show that our model could consistently improve translation performance on a variety of machine translation datasets, ranging from small to large dataset sizes, and with different source languages.
pdf
bib
abs
RoChBert: Towards Robust BERT Fine-tuning for Chinese
Zihan Zhang
|
Jinfeng Li
|
Ning Shi
|
Bo Yuan
|
Xiangyu Liu
|
Rong Zhang
|
Hui Xue
|
Donghong Sun
|
Chao Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022
Despite of the superb performance on a wide range of tasks, pre-trained language models (e.g., BERT) have been proved vulnerable to adversarial texts. In this paper, we present RoChBERT, a framework to build more Robust BERT-based models by utilizing a more comprehensive adversarial graph to fuse Chinese phonetic and glyph features into pre-trained representations during fine-tuning. Inspired by curriculum learning, we further propose to augment the training dataset with adversarial texts in combination with intermediate samples. Extensive experiments demonstrate that RoChBERT outperforms previous methods in significant ways: (i) robust – RoChBERT greatly improves the model robustness without sacrificing accuracy on benign texts. Specifically, the defense lowers the success rates of unlimited and limited attacks by 59.43% and 39.33% respectively, while remaining accuracy of 93.30%; (ii) flexible – RoChBERT can easily extend to various language models to solve different downstream tasks with excellent performance; and (iii) efficient – RoChBERT can be directly applied to the fine-tuning stage without pre-training language model from scratch, and the proposed data augmentation method is also low-cost.
2013
pdf
bib
A Hybrid Model For Grammatical Error Correction
Yang Xiang
|
Bo Yuan
|
Yaoyun Zhang
|
Xiaolong Wang
|
Wen Zheng
|
Chongqiang Wei
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task
2012
pdf
bib
A Mixed Deterministic Model for Coreference Resolution
Bo Yuan
|
Qingcai Chen
|
Yang Xiang
|
Xiaolong Wang
|
Liping Ge
|
Zengjian Liu
|
Meng Liao
|
Xianbo Si
Joint Conference on EMNLP and CoNLL - Shared Task
2010
pdf
bib
A Cascade Method for Detecting Hedges and their Scope in Natural Language Text
Buzhou Tang
|
Xiaolong Wang
|
Xuan Wang
|
Bo Yuan
|
Shixi Fan
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task