Peng Wang


pdf bib
BADGE: Speeding Up BERT Inference after Deployment via Block-wise Bypasses and Divergence-based Early Exiting
Wei Zhu | Peng Wang | Yuan Ni | Guotong Xie | Xiaoling Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

Early exiting can reduce the average latency of pre-trained language models (PLMs) via its adaptive inference mechanism and work with other inference speed-up methods like model pruning, thus drawing much attention from the industry. In this work, we propose a novel framework, BADGE, which consists of two off-the-shelf methods for improving PLMs’ early exiting. We first address the issues of training a multi-exit PLM, the backbone model for early exiting. We propose the novel architecture of block-wise bypasses, which can alleviate the conflicts in jointly training multiple intermediate classifiers and thus improve the overall performances of multi-exit PLM while introducing negligible additional flops to the model. Second, we propose a novel divergence-based early exiting (DGE) mechanism, which obtains early exiting signals by comparing the predicted distributions of two adjacent layers’ exits. Extensive experiments on three proprietary datasets and three GLUE benchmark tasks demonstrate that our method can obtain a better speedup-performance trade-off than the existing baseline methods.\footnote{Code will be made publicly available to the research community upon acceptance.}

pdf bib
Prompt Tuning for Unified Multimodal Pretrained Models
Hao Yang | Junyang Lin | An Yang | Peng Wang | Chang Zhou
Findings of the Association for Computational Linguistics: ACL 2023

Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural language pretraining and even vision pretraining. The parameter-efficient prompt tuning methods that optimize soft embeddings while keeping the pretrained model frozen demonstrate advantages in low computation costs and almost lossless performance. In this work, we explore the transfer of prompt tuning to multimodal pretrained models. Specifically, we implement prompt tuning to a unified sequence-to-sequence pretrained model by adding a sequence of learnable embeddings to each layer and finetuning the pretrained model on downstream task with only the learnable embeddings being optimized. Experimental results on a series of multimodal understanding and generation tasks demonstrate that our method OFA-PT can achieve comparable performance with finetuning across a series of multimodal generation and understanding tasks. Additionally, it significantly outperforms the unified multimodal pretrained model with other parameter-efficient tuning methods, e.g., Adapter, BitFit. etc. Besides, in comparison with finetuned models, the prompt-tuned models demonstrate improved robustness against adversarial attacks. We further figure out that experimental factors, including prompt length, prompt depth, and reparameteratization, have great impacts on the model performance, and thus we empirically provide a recommendation for the setups of prompt tuning.

pdf bib
Transferring General Multimodal Pretrained Models to Text Recognition
Junyang Lin | Xuancheng Ren | Yichang Zhang | Gao Liu | Peng Wang | An Yang | Chang Zhou
Findings of the Association for Computational Linguistics: ACL 2023

This paper proposes a new method, OFA-OCR, to transfer multimodal pretrained models to text recognition. Specifically, we recast text recognition as image captioning and directly transfer a unified vision-language pretrained model to the end task. Without pretraining on large-scale annotated or synthetic text recognition data, OFA-OCR outperforms the baselines and achieves state-of-the-art performance in the Chinese text recognition benchmark. Additionally, we construct an OCR pipeline with OFA-OCR, and we demonstrate that it can achieve competitive performance with the product-level API.

pdf bib
Learned Adapters Are Better Than Manually Designed Adapters
Yuming Zhang | Peng Wang | Ming Tan | Wei Zhu
Findings of the Association for Computational Linguistics: ACL 2023

Recently, a series of works have looked into further improving the adapter-based tuning by manually designing better adapter architectures. Understandably, these manually designed solutions are sub-optimal. In this work, we propose the Learned Adapter framework to automatically learn the optimal adapter architectures for better task adaptation of pre-trained models (PTMs). First, we construct a unified search space for adapter architecture designs. In terms of the optimization method on the search space, we propose a simple-yet-effective method, GDNAS for better architecture optimization. Extensive experiments show that our Learned Adapter framework can outperform the previous parameter-efficient tuning (PETuning) baselines while tuning comparable or fewer parameters. Moreover: (a) the learned adapter architectures are explainable and transferable across tasks. (b) We demonstrate that our architecture search space design is valid.

pdf bib
Revisiting Large Language Models as Zero-shot Relation Extractors
Guozheng Li | Peng Wang | Wenjun Ke
Findings of the Association for Computational Linguistics: EMNLP 2023

Relation extraction (RE) consistently involves a certain degree of labeled or unlabeled data even if under zero-shot setting. Recent studies have shown that large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt, which provides the possibility of extracting relations from text without any data and parameter tuning. This work focuses on the study of exploring LLMs, such as ChatGPT, as zero-shot relation extractors. On the one hand, we analyze the drawbacks of existing RE prompts and attempt to incorporate recent prompt techniques such as chain-of-thought (CoT) to improve zero-shot RE. We propose the summarize-and-ask (SumAsk) prompting, a simple prompt recursively using LLMs to transform RE inputs to the effective question answering (QA) format. On the other hand, we conduct comprehensive experiments on various benchmarks and settings to investigate the capabilities of LLMs on zero-shot RE. Specifically, we have the following findings: (i) SumAsk consistently and significantly improves LLMs performance on different model sizes, benchmarks and settings; (ii) Zero-shot prompting with ChatGPT achieves competitive or superior results compared with zero-shot and fully supervised methods; (iii) LLMs deliver promising performance in extracting overlapping relations; (iv) The performance varies greatly regarding different relations. Different from small language models, LLMs are effective in handling challenge none-of-the-above (NoTA) relation.

pdf bib
Knowledge Rumination for Pre-trained Language Models
Yunzhi Yao | Peng Wang | Shengyu Mao | Chuanqi Tan | Fei Huang | Huajun Chen | Ningyu Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Previous studies have revealed that vanilla pre-trained language models (PLMs) lack the capacity to handle knowledge-intensive NLP tasks alone; thus, several works have attempted to integrate external knowledge into PLMs. However, despite the promising outcome, we empirically observe that PLMs may have already encoded rich knowledge in their pre-trained parameters but fails to fully utilize them when applying to knowledge-intensive tasks. In this paper, we propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize that related latent knowledge without retrieving them from the external corpus. By simply adding a prompt like “As far as I know” to the PLMs, we try to review related latent knowledge and inject them back into the model for knowledge consolidation. We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3. Experimental results on six commonsense reasoning tasks and GLUE benchmarks demonstrate the effectiveness of our proposed approach, which proves that the knowledge stored in PLMs can be better exploited to enhance performance.

pdf bib
Editing Large Language Models: Problems, Methods, and Opportunities
Yunzhi Yao | Peng Wang | Bozhong Tian | Siyuan Cheng | Zhoubo Li | Shumin Deng | Huajun Chen | Ningyu Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Despite the ability to train capable LLMs, the methodology for maintaining their relevancy and rectifying errors remains elusive. To this end, the past few years have witnessed a surge in techniques for editing LLMs, the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. In particular, we provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal. We also build a new benchmark dataset to facilitate a more robust evaluation and pinpoint enduring issues intrinsic to existing techniques. Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.

pdf bib
Adaptive Hyper-parameter Learning for Deep Semantic Retrieval
Mingming Li | Chunyuan Yuan | Huimu Wang | Peng Wang | Jingwei Zhuo | Binbin Wang | Lin Liu | Sulong Xu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

Deep semantic retrieval has achieved remarkable success in online E-commerce applications. The majority of methods aim to distinguish positive items and negative items for each query by utilizing margin loss or softmax loss. Despite their decent performance, these methods are highly sensitive to hyper-parameters, i.e., margin and temperature 𝜏, which measure the similarity of negative pairs and affect the distribution of items in metric space. How to design and choose adaptively parameters for different pairs is still an open challenge. Recently several methods have attempted to alleviate the above problem by learning each parameter through trainable/statistical methods in the recommendation. We argue that those are not suitable for retrieval scenarios, due to the agnosticism and diversity of the queries. To fully overcome this limitation, we propose a novel adaptive metric learning method that designs a simple and universal hyper-parameter-free learning method to improve the performance of retrieval. Specifically, we first propose a method that adaptive obtains the hyper-parameters by relying on the batch similarity without fixed or extra-trainable hyper-parameters. Subsequently, we adopt a symmetric metric learning method to mitigate model collapse issues. Furthermore, the proposed method is general and sheds a highlight on other fields. Extensive experiments demonstrate our method significantly outperforms previous methods on a real-world dataset, highlighting the superiority and effectiveness of our method. This method has been successfully deployed on an online E-commerce search platform and brought substantial economic benefits.


pdf bib
End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding
Mengze Li | Tianbao Wang | Haoyu Zhang | Shengyu Zhang | Zhou Zhao | Jiaxu Miao | Wenqiao Zhang | Wenming Tan | Jin Wang | Peng Wang | Shiliang Pu | Fei Wu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Natural language spatial video grounding aims to detect the relevant objects in video frames with descriptive sentences as the query. In spite of the great advances, most existing methods rely on dense video frame annotations, which require a tremendous amount of human effort. To achieve effective grounding under a limited annotation budget, we investigate one-shot video grounding and learn to ground natural language in all video frames with solely one frame labeled, in an end-to-end manner. One major challenge of end-to-end one-shot video grounding is the existence of videos frames that are either irrelevant to the language query or the labeled frame. Another challenge relates to the limited supervision, which might result in ineffective representation learning. To address these challenges, we designed an end-to-end model via Information Tree for One-Shot video grounding (IT-OS). Its key module, the information tree, can eliminate the interference of irrelevant frames based on branch search and branch cropping techniques. In addition, several self-supervised tasks are proposed based on the information tree to improve the representation learning under insufficient labeling. Experiments on the benchmark dataset demonstrate the effectiveness of our model.

pdf bib
CapOnImage: Context-driven Dense-Captioning on Image
Yiqi Gao | Xinglin Hou | Yuanmeng Zhang | Tiezheng Ge | Yuning Jiang | Peng Wang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Existing image captioning systems are dedicated to generating narrative captions for images, which are spatially detached from theimage in presentation. However, texts can also be used as decorations on the image to highlight the key points and increase theattractiveness of images. In this work, we introduce a new taskcalled captioning on image (CapOnImage), which aims to generatedense captions at different locations of the image based on contextual information. To fully exploit the surrounding visual context togenerate the most suitable caption for each location, we propose amulti-modal pre-training model with multi-level pre-training tasksthat progressively learn the correspondence between texts and image locations from easy to difficult. Since the model may generateredundant captions for nearby locations, we further enhance thelocation embedding with neighbor locations as context. For thisnew task, we also introduce a large-scale benchmark called CapOnImage2M, which contains 2.1 million product images, each with anaverage of 4.8 spatially localized captions. Compared with other image captioning model variants, our model achieves the best resultsin both captioning accuracy and diversity aspects.

pdf bib
PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting
Zhen Zhang | Wei Zhu | Jinfan Zhang | Peng Wang | Rize Jin | Tae-Sun Chung
Findings of the Association for Computational Linguistics: NAACL 2022

BERT and other pretrained language models (PLMs) are ubiquitous in modern NLP. Even though PLMs are the state-of-the-art (SOTA) models for almost every NLP task (CITATION), the significant latency during inference prohibits wider industrial usage. In this work, we propose Patient and Confident Early Exiting BERT (PCEE-BERT), an off-the-shelf sample-dependent early exiting method that can work with different PLMs and can also work along with popular model compression methods. With a multi-exit BERT as the backbone model, PCEE-BERT will make the early exiting decision if enough numbers (patience parameter) of consecutive intermediate layers are confident about their predictions. The entropy value measures the confidence level of an intermediate layer’s prediction. Experiments on the GLUE benchmark demonstrate that our method outperforms previous SOTA early exiting methods. Ablation studies show that: (a) our method performs consistently well on other PLMs, such as ALBERT and TinyBERT; (b) PCEE-BERT can achieve different speed-up ratios by adjusting the patience parameter and the confidence threshold. The code for PCEE-BERT can be found at


pdf bib
Sketch and Refine: Towards Faithful and Informative Table-to-Text Generation
Peng Wang | Junyang Lin | An Yang | Chang Zhou | Yichang Zhang | Jingren Zhou | Hongxia Yang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Hyperbolic Hierarchy-Aware Knowledge Graph Embedding for Link Prediction
Zhe Pan | Peng Wang
Findings of the Association for Computational Linguistics: EMNLP 2021

Knowledge graph embedding (KGE) using low-dimensional representations to predict missing information is widely applied in knowledge completion. Existing embedding methods are mostly built on Euclidean space, which are difficult to handle hierarchical structures. Hyperbolic embedding methods have shown the promise of high fidelity and concise representation for hierarchical data. However, the logical patterns in knowledge graphs are not considered well in these methods. To address this problem, we propose a novel KGE model with extended Poincaré Ball and polar coordinate system to capture hierarchical structures. We use the tangent space and exponential transformation to initialize and map the corresponding vectors to the Poincaré Ball in hyperbolic space. To solve the boundary conditions, the boundary is stretched and zoomed by expanding the modulus length in the Poincaré Ball. We optimize our model using polar coordinate and changing operators in the extended Poincaré Ball. Experiments achieve new state-of-the-art results on part of link prediction tasks, which demonstrates the effectiveness of our method.

pdf bib
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
Hiroaki Hayashi | Prashant Budania | Peng Wang | Chris Ackerson | Raj Neervannan | Graham Neubig
Transactions of the Association for Computational Linguistics, Volume 9

Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp,1 a large-scale dataset for multi-domain aspect- based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.


pdf bib
Ferryman as SemEval-2020 Task 5: Optimized BERT for Detecting Counterfactuals
Weilong Chen | Yan Zhuang | Peng Wang | Feng Hong | Yan Wang | Yanru Zhang
Proceedings of the Fourteenth Workshop on Semantic Evaluation

The main purpose of this article is to state the effect of using different methods and models for counterfactual determination and detection of causal knowledge. Nowadays, counterfactual reasoning has been widely used in various fields. In the realm of natural language process(NLP), counterfactual reasoning has huge potential to improve the correctness of a sentence. In the shared Task 5 of detecting counterfactual in SemEval 2020, we pre-process the officially given dataset according to case conversion, extract stem and abbreviation replacement. We use last-5 bidirectional encoder representation from bidirectional encoder representation from transformer (BERT)and term frequency–inverse document frequency (TF-IDF) vectorizer for counterfactual detection. Meanwhile, multi-sample dropout and cross validation are used to improve versatility and prevent problems such as poor generosity caused by overfitting. Finally, our team Ferryman ranked the 8th place in the sub-task 1 of this competition.

pdf bib
MeisterMorxrc at SemEval-2020 Task 9: Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed Tweets
Qi Wu | Peng Wang | Chenghao Huang
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Natural language processing (NLP) has been applied to various fields including text classification and sentiment analysis. In the shared task of sentiment analysis of code-mixed tweets, which is a part of the SemEval-2020 competition, we preprocess datasets by replacing emoji and deleting uncommon characters and so on, and then fine-tune the Bidirectional Encoder Representation from Transformers(BERT) to perform the best. After exhausting top3 submissions, Our team MeisterMorxrc achieves an averaged F1 score of 0.730 in this task, and and our codalab username is MeisterMorxrc

pdf bib
Ferryman at SemEval-2020 Task 12: BERT-Based Model with Advanced Improvement Methods for Multilingual Offensive Language Identification
Weilong Chen | Peng Wang | Jipeng Li | Yuanshuai Zheng | Yan Wang | Yanru Zhang
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Indiscriminately posting offensive remarks on social media may promote the occurrence of negative events such as violence, crime, and hatred. This paper examines different approaches and models for solving offensive tweet classification, which is a part of the OffensEval 2020 competition. The dataset is Offensive Language Identification Dataset (OLID), which draws 14,200 annotated English Tweet comments. The main challenge of data preprocessing is the unbalanced class distribution, abbreviation, and emoji. To overcome these issues, methods such as hashtag segmentation, abbreviation replacement, and emoji replacement have been adopted for data preprocessing approaches. The main task can be divided into three sub-tasks, and are solved by Term Frequency–Inverse Document Frequency(TF-IDF), Bidirectional Encoder Representation from Transformer (BERT), and Multi-dropout respectively. Meanwhile, we applied different learning rates for different languages and tasks based on BERT and non-BERTmodels in order to obtain better results. Our team Ferryman ranked the 18th, 8th, and 21st with F1-score of 0.91152 on the English Sub-task A, Sub-task B, and Sub-task C, respectively. Furthermore, our team also ranked in the top 20 on the Sub-task A of other languages.

pdf bib
AprilE: Attention with Pseudo Residual Connection for Knowledge Graph Embedding
Yuzhang Liu | Peng Wang | Yingtai Li | Yizhan Shao | Zhongkai Xu
Proceedings of the 28th International Conference on Computational Linguistics

Knowledge graph embedding maps entities and relations into low-dimensional vector space. However, it is still challenging for many existing methods to model diverse relational patterns, especially symmetric and antisymmetric relations. To address this issue, we propose a novel model, AprilE, which employs triple-level self-attention and pseudo residual connection to model relational patterns. The triple-level self-attention treats head entity, relation, and tail entity as a sequence and captures the dependency within a triple. At the same time the pseudo residual connection retains primitive semantic features. Furthermore, to deal with symmetric and antisymmetric relations, two schemas of score function are designed via a position-adaptive mechanism. Experimental results on public datasets demonstrate that our model can produce expressive knowledge embedding and significantly outperforms most of the state-of-the-art works.


pdf bib
CVTE at IJCNLP-2017 Task 1: Character Checking System for Chinese Grammatical Error Diagnosis Task
Xian Li | Peng Wang | Suixue Wang | Guanyu Jiang | Tianyuan You
Proceedings of the IJCNLP 2017, Shared Tasks

Grammatical error diagnosis is an important task in natural language processing. This paper introduces CVTE Character Checking System in the NLP-TEA-4 shared task for CGED 2017, we use Bi-LSTM to generate the probability of every character, then take two kinds of strategies to decide whether a character is correct or not. This system is probably more suitable to deal with the error type of bad word selection, which is one of four types of errors, and the rest are words re-dundancy, words missing and words disorder. Finally the second strategy achieves better F1 score than the first one at all of detection level, identification level, position level.


pdf bib
Short Text Clustering via Convolutional Neural Networks
Jiaming Xu | Peng Wang | Guanhua Tian | Bo Xu | Jun Zhao | Fangyuan Wang | Hongwei Hao
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

pdf bib
Semantic Clustering and Convolutional Neural Network for Short Text Categorization
Peng Wang | Jiaming Xu | Bo Xu | Chenglin Liu | Heng Zhang | Fangyuan Wang | Hongwei Hao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)