Minlong Peng


pdf bib
NormNet: Normalize Noun Phrases for More Robust NLP
Minlong Peng | Mingming Sun
Findings of the Association for Computational Linguistics: ACL 2023

A critical limitation of deep NLP models is their over-fitting over spurious features. Previous work has proposed several approaches to debunk such features and reduce their impact on the learned models. In this work, a normalization strategy is proposed to eliminate the false features caused by the textual surfaces of noun phrases. The motivation for this strategy is that noun phrases often play the role of slots in textual expressions and their exact forms are often not that important for performing the final task. As an intuitive example, consider the expression ”x like eating y". There are a huge number of suitable instantiations for x and y in the locale. However, humans can already infer the sentiment polarity of x toward y without knowing their exact forms.Based on this intuition, we introduce NormNet, a pretrained language model based network, to implement the normalization strategy. NormNet learns to replace as many noun phrases in the input sentence as possible with pre-defined base forms. The output of NormNet is then fed as input to a prompt-based learning model to perform label prediction. To evaluate the effectiveness of our strategy, we conducted experimental studies on several tasks, including aspect sentiment classification (ASC), semantic text similarity (STS), and natural language inference (NLI). The experimental results confirm the effectiveness of our strategy.

pdf bib
Connectivity Patterns are Task Embeddings
Zhiheng Xi | Rui Zheng | Yuansen Zhang | Xuanjing Huang | Zhongyu Wei | Minlong Peng | Mingming Sun | Qi Zhang | Tao Gui
Findings of the Association for Computational Linguistics: ACL 2023

Task embeddings are task-specific vectors designed to construct a semantic space of tasks, which can be used to predict the most transferable source task for a given target task via the similarity between task embeddings. However, existing methods use optimized parameters and representations as task embeddings, resulting in substantial computational complexity and storage requirements. In this work, we draw inspiration from the operating mechanism of deep neural networks (DNNs) and biological brains, where neuronal activations are sparse and task-specific, and we use the connectivity patterns of neurons as a unique identifier associated with the task. The proposed method learns to assign importance masks for sub-structures of DNNs, and accordingly indicate the task-specific connectivity patterns. In addition to the storage advantages brought by the binary masking mechanism and structured sparsity, the early-bird nature of the sparse optimization process can deliver an efficient computation advantage. Experiments show that our method consistently outperforms other baselines in predicting inter-task transferability across data regimes and transfer settings, while keeping high efficiency in computation and storage.

pdf bib
Actively Supervised Clustering for Open Relation Extraction
Jun Zhao | Yongxin Zhang | Qi Zhang | Tao Gui | Zhongyu Wei | Minlong Peng | Mingming Sun
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Current clustering-based Open Relation Extraction (OpenRE) methods usually adopt a two-stage pipeline, which simultaneously learns relation representations and assignments in the first stage, then manually labels relation for each cluster. However, unsupervised objectives struggle to explicitly optimize clusters to align with relational semantics, and the number of clusters K has to be supplied in advance. In this paper, we present a novel setting, named actively supervised clustering for OpenRE. Our insight lies in that clustering learning and relation labeling can be performed simultaneously, which provides the necessary guidance for clustering without a significant increase in human effort. Along with this setting, we propose an active labeling strategy tailored for clustering. Instead of only focusing on improving the clustering of relations that have been discovered, our strategy is encouraged to discover new relations through diversity regularization. This is particularly beneficial for long-tail relations in the real world. Experimental results show that our method is able to discover almost all relational clusters in the data and improve the SOTA methods by 13.8% and 10.6%, on two datasets respectively.

pdf bib
RE-Matching: A Fine-Grained Semantic Matching Method for Zero-Shot Relation Extraction
Jun Zhao | WenYu Zhan | Xin Zhao | Qi Zhang | Tao Gui | Zhongyu Wei | Junzhe Wang | Minlong Peng | Mingming Sun
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Semantic matching is a mainstream paradigm of zero-shot relation extraction, which matches a given input with a corresponding label description. The entities in the input should exactly match their hypernyms in the description, while the irrelevant contexts should be ignored when matching. However, general matching methods lack explicit modeling of the above matching pattern. In this work, we propose a fine-grained semantic matching method tailored for zero-shot relation extraction. Guided by the above matching pattern, we decompose the sentence-level similarity score into the entity matching score and context matching score. Considering that not all contextual words contribute equally to the relation semantics, we design a context distillation module to reduce the negative impact of irrelevant components on context matching. Experimental results show that our method achieves higher matching accuracy and more than 10 times faster inference speed, compared with the state-of-the-art methods.


pdf bib
OIE@OIA: an Adaptable and Efficient Open Information Extraction Framework
Xin Wang | Minlong Peng | Mingming Sun | Ping Li
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Different Open Information Extraction (OIE) tasks require different types of information, so the OIE field requires strong adaptability of OIE algorithms to meet different task requirements. This paper discusses the adaptability problem in existing OIE systems and designs a new adaptable and efficient OIE system - OIE@OIA as a solution. OIE@OIA follows the methodology of Open Information eXpression (OIX): parsing a sentence to an Open Information Annotation (OIA) Graph and then adapting the OIA graph to different OIE tasks with simple rules. As the core of our OIE@OIA system, we implement an end-to-end OIA generator by annotating a dataset (we make it open available) and designing an efficient learning algorithm for the complex OIA graph. We easily adapt the OIE@OIA system to accomplish three popular OIE tasks. The experimental show that our OIE@OIA achieves new SOTA performances on these tasks, showing the great adaptability of our OIE@OIA system. Furthermore, compared to other end-to-end OIE baselines that need millions of samples for training, our OIE@OIA needs much fewer training samples (12K), showing a significant advantage in terms of efficiency.

pdf bib
Improving Semantic Matching through Dependency-Enhanced Pre-trained Model with Adaptive Fusion
Jian Song | Di Liang | Rumei Li | Yuntao Li | Sirui Wang | Minlong Peng | Wei Wu | Yongxin Yu
Findings of the Association for Computational Linguistics: EMNLP 2022

Transformer-based pre-trained models like BERT have achieved great progress on Semantic Sentence Matching. Meanwhile, dependency prior knowledge has also shown general benefits in multiple NLP tasks. However, how to efficiently integrate dependency prior structure into pre-trained models to better model complex semantic matching relations is still unsettled. In this paper, we propose the Dependency-Enhanced Adaptive Fusion Attention (DAFA), which explicitly introduces dependency structure into pre-trained models and adaptively fuses it with semantic information. Specifically, (i) DAFA first proposes a structure-sensitive paradigm to construct a dependency matrix for calibrating attention weights. (ii) It adopts an adaptive fusion module to integrate the obtained dependency information and the original semantic signals. Moreover, DAFA reconstructs the attention calculation flow and provides better interpretability. By applying it on BERT, our method achieves state-of-the-art or competitive performance on 10 public datasets, demonstrating the benefits of adaptively fusing dependency structure in semantic matching task.


pdf bib
TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
Xiao Wang | Qin Liu | Tao Gui | Qi Zhang | Yicheng Zou | Xin Zhou | Jiacheng Ye | Yongxin Zhang | Rui Zheng | Zexiong Pang | Qinzhuo Wu | Zhengyan Li | Chong Zhang | Ruotian Ma | Zichu Fei | Ruijian Cai | Jun Zhao | Xingwu Hu | Zhiheng Yan | Yiding Tan | Yuan Hu | Qiyuan Bian | Zhihua Liu | Shan Qin | Bolin Zhu | Xiaoyu Xing | Jinlan Fu | Yue Zhang | Minlong Peng | Xiaoqing Zheng | Yaqian Zhou | Zhongyu Wei | Xipeng Qiu | Xuanjing Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

TextFlint is a multilingual robustness evaluation toolkit for NLP tasks that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analyses. This enables practitioners to automatically evaluate their models from various aspects or to customize their evaluations as desired with just a few lines of code. TextFlint also generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model in terms of its robustness. To guarantee acceptability, all the text transformations are linguistically based and all the transformed data selected (up to 100,000 texts) scored highly under human evaluation. To validate the utility, we performed large-scale empirical evaluations (over 67,000) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. The toolkit is already available at https://github.com/textflint with all the evaluation results demonstrated at textflint.io.


pdf bib
Simplify the Usage of Lexicon in Chinese NER
Ruotian Ma | Minlong Peng | Qi Zhang | Zhongyu Wei | Xuanjing Huang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recently, many works have tried to augment the performance of Chinese named entity recognition (NER) using word lexicons. As a representative, Lattice-LSTM has achieved new benchmark results on several public Chinese NER datasets. However, Lattice-LSTM has a complex model architecture. This limits its application in many industrial areas where real-time NER responses are needed. In this work, we propose a simple but effective method for incorporating the word lexicon into the character representations. This method avoids designing a complicated sequence modeling architecture, and for any neural NER model, it requires only subtle adjustment of the character representation layer to introduce the lexicon information. Experimental studies on four benchmark Chinese NER datasets show that our method achieves an inference speed up to 6.15 times faster than those of state-of-the-art methods, along with a better performance. The experimental results also show that the proposed method can be easily incorporated with pre-trained models like BERT.

pdf bib
Toward Recognizing More Entity Types in NER: An Efficient Implementation using Only Entity Lexicons
Minlong Peng | Ruotian Ma | Qi Zhang | Lujun Zhao | Mengxi Wei | Changlong Sun | Xuanjing Huang
Findings of the Association for Computational Linguistics: EMNLP 2020

In this work, we explore the way to quickly adjust an existing named entity recognition (NER) system to make it capable of recognizing entity types not defined in the system. As an illustrative example, consider the case that a NER system has been built to recognize person and organization names, and now it requires to additionally recognize job titles. Such a situation is common in the industrial areas, where the entity types required to recognize vary a lot in different products and keep changing. To avoid laborious data labeling and achieve fast adaptation, we propose to adjust the existing NER system using the previously labeled data and entity lexicons of the newly introduced entity types. We formulate such a task as a partially supervised learning problem and accordingly propose an effective algorithm to solve the problem. Comprehensive experimental studies on several public NER datasets validate the effectiveness of our method.

pdf bib
Weighed Domain-Invariant Representation Learning for Cross-domain Sentiment Analysis
Minlong Peng | Qi Zhang
Proceedings of the 28th International Conference on Computational Linguistics

Cross-domain sentiment analysis is currently a hot topic in both the research and industrial areas. One of the most popular framework for the task is domain-invariant representation learning (DIRL), which aims to learn a distribution-invariant feature representation across domains. However, in this work, we find out that applying DIRL may degrade domain adaptation performance when the label distribution P(Y) changes across domains. To address this problem, we propose a modification to DIRL, obtaining a novel weighted domain-invariant representation learning (WDIRL) framework. We show that it is easy to transfer existing models of the DIRL framework to the WDIRL framework. Empirical studies on extensive cross-domain sentiment analysis tasks verified our statements and showed the effectiveness of our proposed solution.


pdf bib
Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning
Minlong Peng | Xiaoyu Xing | Qi Zhang | Jinlan Fu | Xuanjing Huang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this work, we explore the way to perform named entity recognition (NER) using only unlabeled data and named entity dictionaries. To this end, we formulate the task as a positive-unlabeled (PU) learning problem and accordingly propose a novel PU learning algorithm to perform the task. We prove that the proposed algorithm can unbiasedly and consistently estimate the task loss as if there is fully labeled data. A key feature of the proposed method is that it does not require the dictionaries to label every entity within a sentence, and it even does not require the dictionaries to label all of the words constituting an entity. This greatly reduces the requirement on the quality of the dictionaries and makes our method generalize well with quite simple dictionaries. Empirical studies on four public NER datasets demonstrate the effectiveness of our proposed method. We have published the source code at https://github.com/v-mipeng/LexiconNER.

pdf bib
A Lexicon-Based Graph Neural Network for Chinese NER
Tao Gui | Yicheng Zou | Qi Zhang | Minlong Peng | Jinlan Fu | Zhongyu Wei | Xuanjing Huang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Recurrent neural networks (RNN) used for Chinese named entity recognition (NER) that sequentially track character and word information have achieved great success. However, the characteristic of chain structure and the lack of global semantics determine that RNN-based models are vulnerable to word ambiguities. In this work, we try to alleviate this problem by introducing a lexicon-based graph neural network with global semantics, in which lexicon knowledge is used to connect characters to capture the local composition, while a global relay node can capture global sentence semantics and long-range dependency. Based on the multiple graph-based interactions among characters, potential words, and the whole-sentence semantics, word ambiguities can be effectively tackled. Experiments on four NER datasets show that the proposed model achieves significant improvements against other baseline models.


pdf bib
Transferring from Formal Newswire Domain with Hypernet for Twitter POS Tagging
Tao Gui | Qi Zhang | Jingjing Gong | Minlong Peng | Di Liang | Keyu Ding | Xuanjing Huang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Part-of-Speech (POS) tagging for Twitter has received considerable attention in recent years. Because most POS tagging methods are based on supervised models, they usually require a large amount of labeled data for training. However, the existing labeled datasets for Twitter are much smaller than those for newswire text. Hence, to help POS tagging for Twitter, most domain adaptation methods try to leverage newswire datasets by learning the shared features between the two domains. However, from a linguistic perspective, Twitter users not only tend to mimic the formal expressions of traditional media, like news, but they also appear to be developing linguistically informal styles. Therefore, POS tagging for the formal Twitter context can be learned together with the newswire dataset, while POS tagging for the informal Twitter context should be learned separately. To achieve this task, in this work, we propose a hypernetwork-based method to generate different parameters to separately model contexts with different expression styles. Experimental results on three different datasets show that our approach achieves better performance than state-of-the-art methods in most cases.

pdf bib
Cross-Domain Sentiment Classification with Target Domain Specific Information
Minlong Peng | Qi Zhang | Yu-gang Jiang | Xuanjing Huang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The task of adopting a model with good performance to a target domain that is different from the source domain used for training has received considerable attention in sentiment analysis. Most existing approaches mainly focus on learning representations that are domain-invariant in both the source and target domains. Few of them pay attention to domain-specific information, which should also be informative. In this work, we propose a method to simultaneously extract domain specific and invariant representations and train a classifier on each of the representation, respectively. And we introduce a few target domain labeled data for learning domain-specific information. To effectively utilize the target domain labeled data, we train the domain invariant representation based classifier with both the source and target domain labeled data and train the domain-specific representation based classifier with only the target domain labeled data. These two classifiers then boost each other in a co-training style. Extensive sentiment analysis experiments demonstrated that the proposed method could achieve better performance than state-of-the-art methods.


pdf bib
Part-of-Speech Tagging for Twitter with Adversarial Neural Networks
Tao Gui | Qi Zhang | Haoran Huang | Minlong Peng | Xuanjing Huang
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this work, we study the problem of part-of-speech tagging for Tweets. In contrast to newswire articles, Tweets are usually informal and contain numerous out-of-vocabulary words. Moreover, there is a lack of large scale labeled datasets for this domain. To tackle these challenges, we propose a novel neural network to make use of out-of-domain labeled data, unlabeled in-domain data, and labeled in-domain data. Inspired by adversarial neural networks, the proposed method tries to learn common features through adversarial discriminator. In addition, we hypothesize that domain-specific features of target domain should be preserved in some degree. Hence, the proposed method adopts a sequence-to-sequence autoencoder to perform this task. Experimental results on three different datasets show that our method achieves better performance than state-of-the-art methods.