Xuan-Jing Huang

Also published as: Xuan-jing Huang, Xuanjing Huang


2022

pdf bib
Towards Adversarially Robust Text Classifiers by Learning to Reweight Clean Examples
Jianhan Xu | Cenyuan Zhang | Xiaoqing Zheng | Linyang Li | Cho-Jui Hsieh | Kai-Wei Chang | Xuanjing Huang
Findings of the Association for Computational Linguistics: ACL 2022

Most of the existing defense methods improve the adversarial robustness by making the models adapt to the training set augmented with some adversarial examples. However, the augmented adversarial examples may not be natural, which might distort the training distribution, resulting in inferior performance both in clean accuracy and adversarial robustness. In this study, we explore the feasibility of introducing a reweighting mechanism to calibrate the training distribution to obtain robust models. We propose to train text classifiers by a sample reweighting method in which the example weights are learned to minimize the loss of a validation set mixed with the clean examples and their adversarial ones in an online learning manner. Through extensive experiments, we show that there exists a reweighting mechanism to make the models more robust against adversarial attacks without the need to craft the adversarial examples for the entire training set.

pdf bib
A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation
Tianxiang Sun | Xiangyang Liu | Wei Zhu | Zhichao Geng | Lingling Wu | Yilong He | Yuan Ni | Guotong Xie | Xuanjing Huang | Xipeng Qiu
Findings of the Association for Computational Linguistics: ACL 2022

Early exiting allows instances to exit at different layers according to the estimation of difficulty.Previous works usually adopt heuristic metrics such as the entropy of internal outputs to measure instance difficulty, which suffers from generalization and threshold-tuning. In contrast, learning to exit, or learning to predict instance difficulty is a more appealing way. Though some effort has been devoted to employing such “learn-to-exit” modules, it is still unknown whether and how well the instance difficulty can be learned. As a response, we first conduct experiments on the learnability of instance difficulty, which demonstrates that modern neural models perform poorly on predicting instance difficulty. Based on this observation, we propose a simple-yet-effective Hash-based Early Exiting approach HashEE) that replaces the learn-to-exit modules with hash functions to assign each token to a fixed exiting layer. Different from previous methods, HashEE requires no internal classifiers nor extra parameters, and therefore is more efficient.HashEE can be used in various tasks (including language understanding and generation) and model architectures such as seq2seq models. Experimental results on classification, regression, and generation tasks demonstrate that HashEE can achieve higher performance with fewer FLOPs and inference time compared with previous state-of-the-art early exiting methods.

pdf bib
Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval
Zhihao Fan | Zhongyu Wei | Zejun Li | Siyuan Wang | Xuanjing Huang | Jianqing Fan
Findings of the Association for Computational Linguistics: NAACL 2022

Matching model is essential for Image-Text Retrieval framework. Existing research usually train the model with a triplet loss and explore various strategy to retrieve hard negative sentences in the dataset. We argue that current retrieval-based negative sample construction approach is limited in the scale of the dataset thus fail to identify negative sample of high difficulty for every image. We propose our TAiloring neGative Sentences with Discrimination and Correction (TAGS-DC) to generate synthetic sentences automatically as negative samples. TAGS-DC is composed of masking and refilling to generate synthetic negative sentences with higher difficulty. To keep the difficulty during training, we mutually improve the retrieval and generation through parameter sharing. To further utilize fine-grained semantic of mismatch in the negative sentence, we propose two auxiliary tasks, namely word discrimination and word correction to improve the training. In experiments, we verify the effectiveness of our model on MS-COCO and Flickr30K compared with current state-of-the-art models and demonstrates its robustness and faithfulness in the further analysis.

pdf bib
Locate Then Ask: Interpretable Stepwise Reasoning for Multi-hop Question Answering
Siyuan Wang | Zhongyu Wei | Zhihao Fan | Qi Zhang | Xuanjing Huang
Proceedings of the 29th International Conference on Computational Linguistics

Multi-hop reasoning requires aggregating multiple documents to answer a complex question. Existing methods usually decompose the multi-hop question into simpler single-hop questions to solve the problem for illustrating the explainable reasoning process. However, they ignore grounding on the supporting facts of each reasoning step, which tends to generate inaccurate decompositions. In this paper, we propose an interpretable stepwise reasoning framework to incorporate both single-hop supporting sentence identification and single-hop question generation at each intermediate step, and utilize the inference of the current hop for the next until reasoning out the final result. We employ a unified reader model for both intermediate hop reasoning and final hop inference and adopt joint optimization for more accurate and robust multi-hop reasoning. We conduct experiments on two benchmark datasets HotpotQA and 2WikiMultiHopQA. The results show that our method can effectively boost performance and also yields a better interpretable reasoning process without decomposition supervision.

pdf bib
A Multi-Format Transfer Learning Model for Event Argument Extraction via Variational Information Bottleneck
Jie Zhou | Qi Zhang | Qin Chen | Qi Zhang | Liang He | Xuanjing Huang
Proceedings of the 29th International Conference on Computational Linguistics

Event argument extraction (EAE) aims to extract arguments with given roles from texts, which have been widely studied in natural language processing. Most previous works have achieved good performance in specific EAE datasets with dedicated neural architectures. Whereas, these architectures are usually difficult to adapt to new datasets/scenarios with various annotation schemas or formats. Furthermore, they rely on large-scale labeled data for training, which is unavailable due to the high labelling cost in most cases. In this paper, we propose a multi-format transfer learning model with variational information bottleneck, which makes use of the information especially the common knowledge in existing datasets for EAE in new datasets. Specifically, we introduce a shared-specific prompt framework to learn both format-shared and format-specific knowledge from datasets with different formats. In order to further absorb the common knowledge for EAE and eliminate the irrelevant noise, we integrate variational information bottleneck into our architecture to refine the shared representation. We conduct extensive experiments on three benchmark datasets, and obtain new state-of-the-art performance on EAE.

pdf bib
Decorrelate Irrelevant, Purify Relevant: Overcome Textual Spurious Correlations from a Feature Perspective
Shihan Dou | Rui Zheng | Ting Wu | SongYang Gao | Junjie Shan | Qi Zhang | Yueming Wu | Xuanjing Huang
Proceedings of the 29th International Conference on Computational Linguistics

Natural language understanding (NLU) models tend to rely on spurious correlations (i.e., dataset bias) to achieve high performance on in-distribution datasets but poor performance on out-of-distribution ones. Most of the existing debiasing methods often identify and weaken these samples with biased features (i.e., superficial surface features that cause such spurious correlations). However, down-weighting these samples obstructs the model in learning from the non-biased parts of these samples. To tackle this challenge, in this paper, we propose to eliminate spurious correlations in a fine-grained manner from a feature space perspective. Specifically, we introduce Random Fourier Features and weighted re-sampling to decorrelate the dependencies between features to mitigate spurious correlations. After obtaining decorrelated features, we further design a mutual-information-based method to purify them, which forces the model to learn features that are more relevant to tasks. Extensive experiments on two well-studied NLU tasks demonstrate that our method is superior to other comparative approaches.

pdf bib
A Progressive Framework for Role-Aware Rumor Resolution
Lei Chen | Guanying Li | Zhongyu Wei | Yang Yang | Baohua Zhou | Qi Zhang | Xuanjing Huang
Proceedings of the 29th International Conference on Computational Linguistics

Existing works on rumor resolution have shown great potential in recognizing word appearance and user participation. However, they ignore the intrinsic propagation mechanisms of rumors and present poor adaptive ability when unprecedented news emerges. To exploit the fine-grained rumor diffusion patterns and generalize rumor resolution methods, we formulate a predecessor task to identify triggering posts, and then exploit their characteristics to facilitate rumor verification. We design a tree-structured annotation interface and extend PHEME dataset with labels on the message level. Data analysis shows that triggers play a critical role in verifying rumors and present similar lingual patterns across irrelevant events. We propose a graph-based model considering the direction and interaction of information flow to implement role-aware rumor resolution. Experimental results demonstrate the effectiveness of our proposed model and progressive scheme.

pdf bib
PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack
Rui Zheng | Rong Bao | Qin Liu | Tao Gui | Qi Zhang | Xuanjing Huang | Rui Xie | Wei Wu
Proceedings of the 29th International Conference on Computational Linguistics

Adversarial training, which minimizes the loss of adversarially perturbed examples, has received considerable attention. However, these methods require modifying all model parameters and optimizing the model from scratch, which is parameter inefficient and unfriendly to the already deployed models. As an alternative, we propose a pluggable defense module PlugAT, to provide robust predictions by adding a few trainable parameters to the model inputs while keeping the original model frozen. To reduce the potential side effects of using defense modules, we further propose a novel forgetting restricted adversarial training, which filters out bad adversarial examples that impair the performance of original ones. The PlugAT-equipped BERT model substantially improves robustness over several strong baselines on various text classification tasks, whilst training only 9.1% parameters. We observe that defense modules trained under the same model architecture have domain adaptation ability between similar text classification datasets.

pdf bib
CoLo: A Contrastive Learning Based Re-ranking Framework for One-Stage Summarization
Chenxin An | Ming Zhong | Zhiyong Wu | Qin Zhu | Xuanjing Huang | Xipeng Qiu
Proceedings of the 29th International Conference on Computational Linguistics

Traditional training paradigms for extractive and abstractive summarization systems always only use token-level or sentence-level training objectives. However, the output summary is always evaluated from summary-level which leads to the inconsistency in training and evaluation. In this paper, we propose a Contrastive Learning based re-ranking framework for one-stage summarization called CoLo. By modeling a contrastive objective, we show that the summarization model is able to directly generate summaries according to the summary-level score without additional modules and parameters. Extensive experiments demonstrate that CoLo boosts the extractive and abstractive results of one-stage systems on CNN/DailyMail benchmark to 44.58 and 46.33 ROUGE-1 score while preserving the parameter efficiency and inference efficiency. Compared with state-of-the-art multi-stage systems, we save more than 100 GPU training hours and obtaining 3x 8x speed-up ratio during inference while maintaining comparable results.

pdf bib
Improving Abstractive Dialogue Summarization with Speaker-Aware Supervised Contrastive Learning
Zhichao Geng | Ming Zhong | Zhangyue Yin | Xipeng Qiu | Xuanjing Huang
Proceedings of the 29th International Conference on Computational Linguistics

Pre-trained models have brought remarkable success on the text summarization task. For dialogue summarization, the subdomain of text summarization, utterances are concatenated to flat text before being processed. As a result, existing summarization systems based on pre-trained models are unable to recognize the unique format of the speaker-utterance pair well in the dialogue. To investigate this issue, we conduct probing tests and manual analysis, and find that the powerful pre-trained model can not identify different speakers well in the conversation, which leads to various factual errors. Moreover, we propose three speaker-aware supervised contrastive learning (SCL) tasks: Token-level SCL, Turn-level SCL, and Global-level SCL. Comprehensive experiments demonstrate that our methods achieve significant performance improvement on two mainstream dialogue summarization datasets. According to detailed human evaluations, pre-trained models equipped with SCL tasks effectively generate summaries with better factual consistency.

pdf bib
LFKQG: A Controlled Generation Framework with Local Fine-tuning for Question Generation over Knowledge Bases
Zichu Fei | Xin Zhou | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 29th International Conference on Computational Linguistics

Question generation over knowledge bases (KBQG) aims at generating natural questions about a subgraph, which can be answered by a given answer entity. Existing KBQG models still face two main challenges: (1) Most models often focus on the most relevant part of the answer entity, while neglecting the rest of the subgraph. (2) There are a large number of out-of-vocabulary (OOV) predicates in real-world scenarios, which are hard to adapt for most KBQG models. To address these challenges, we propose LFKQG, a controlled generation framework for Question Generation over Knowledge Bases. (1) LFKQG employs a simple controlled generation method to generate the questions containing the critical entities in the subgraph, ensuring the question is relevant to the whole subgraph. (2) We propose an optimization strategy called local fine-tuning, which can make good use of the rich information hidden in the pre-trained model to improve the ability of the model to adapt the OOV predicates. Extensive experiments show that our method outperforms existing methods significantly on three widely-used benchmark datasets SimpleQuestion, PathQuestions, and WebQuestions.

pdf bib
Causal Intervention Improves Implicit Sentiment Analysis
Siyin Wang | Jie Zhou | Changzhi Sun | Junjie Ye | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 29th International Conference on Computational Linguistics

Despite having achieved great success for sentiment analysis, existing neural models struggle with implicit sentiment analysis. It is because they may latch onto spurious correlations (“shortcuts”, e.g., focusing only on explicit sentiment words), resulting in undermining the effectiveness and robustness of the learned model. In this work, we propose a CausaL intervention model for implicit sEntiment ANalysis using instrumental variable (CLEAN). We first review sentiment analysis from a causal perspective and analyze the confounders existing in this task. Then, we introduce instrumental variable to eliminate the confounding causal effects, thus extracting the pure causal effect between sentence and sentiment. We compare the proposed CLEAN with several strong baselines on both the general implicit sentiment analysis and aspect-based implicit sentiment analysis tasks. The results indicate the great advantages of our model and the efficacy of implicit sentiment reasoning.

pdf bib
Making Parameter-efficient Tuning More Efficient: A Unified Framework for Classification Tasks
Xin Zhou | Ruotian Ma | Yicheng Zou | Xuanting Chen | Tao Gui | Qi Zhang | Xuanjing Huang | Rui Xie | Wei Wu
Proceedings of the 29th International Conference on Computational Linguistics

Large pre-trained language models (PLMs) have demonstrated superior performance in industrial applications. Recent studies have explored parameter-efficient PLM tuning, which only updates a small amount of task-specific parameters while achieving both high efficiency and comparable performance against standard fine-tuning. However, all these methods ignore the inefficiency problem caused by the task-specific output layers, which is inflexible for us to re-use PLMs and introduces non-negligible parameters. In this work, we focus on the text classification task and propose plugin-tuning, a framework that further improves the efficiency of existing parameter-efficient methods with a unified classifier. Specifically, we re-formulate both token and sentence classification tasks into a unified language modeling task, and map label spaces of different tasks into the same vocabulary space. In this way, we can directly re-use the language modeling heads of PLMs, avoiding introducing extra parameters for different tasks. We conduct experiments on six classification benchmarks. The experimental results show that plugin-tuning can achieve comparable performance against fine-tuned PLMs, while further saving around 50% parameters on top of other parameter-efficient methods.

pdf bib
A Structure-Aware Argument Encoder for Literature Discourse Analysis
Yinzi Li | Wei Chen | Zhongyu Wei | Yujun Huang | Chujun Wang | Siyuan Wang | Qi Zhang | Xuanjing Huang | Libo Wu
Proceedings of the 29th International Conference on Computational Linguistics

Existing research for argument representation learning mainly treats tokens in the sentence equally and ignores the implied structure information of argumentative context. In this paper, we propose to separate tokens into two groups, namely framing tokens and topic ones, to capture structural information of arguments. In addition, we consider high-level structure by incorporating paragraph-level position information. A novel structure-aware argument encoder is proposed for literature discourse analysis. Experimental results on both a self-constructed corpus and a public corpus show the effectiveness of our model. Resources are available at https://github.com/lemuria-wchen/SAE.

pdf bib
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Xiangyang Liu | Tianxiang Sun | Junliang He | Jiawen Wu | Lingling Wu | Xinyu Zhang | Hao Jiang | Zhao Cao | Xuanjing Huang | Xipeng Qiu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Supersized pre-trained language models have pushed the accuracy of various natural language processing (NLP) tasks to a new state-of-the-art (SOTA). Rather than pursuing the reachless SOTA accuracy, more and more researchers start paying attention to model efficiency and usability. Different from accuracy, the metric for efficiency varies across different studies, making them hard to be fairly compared. To that end, this work presents ELUE (Efficient Language Understanding Evaluation), a standard evaluation, and a public leaderboard for efficient NLP models. ELUE is dedicated to depicting the Pareto Frontier for various language understanding tasks, such that it can tell whether and how much a method achieves Pareto improvement. Along with the benchmark, we also release a strong baseline, ElasticBERT, which allows BERT to exit at any layer in both static and dynamic ways. We demonstrate the ElasticBERT, despite its simplicity, outperforms or performs on par with SOTA compressed and early exiting models. With ElasticBERT, the proposed ELUE has a strong Pareto Frontier and makes a better evaluation for efficient NLP models.

pdf bib
Template-free Prompt Tuning for Few-shot NER
Ruotian Ma | Xin Zhou | Tao Gui | Yiding Tan | Linyang Li | Qi Zhang | Xuanjing Huang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Prompt-based methods have been successfully applied in sentence-level few-shot learning tasks, mostly owing to the sophisticated design of templates and label words. However, when applied to token-level labeling tasks such as NER, it would be time-consuming to enumerate the template queries over all potential entity spans. In this work, we propose a more elegant method to reformulate NER tasks as LM problems without any templates. Specifically, we discard the template construction process while maintaining the word prediction paradigm of pre-training models to predict a class-related pivot word (or label word) at the entity position. Meanwhile, we also explore principled ways to automatically search for appropriate label words that the pre-trained models can easily adapt to. While avoiding the complicated template-based process, the proposed LM objective also reduces the gap between different objectives used in pre-training and fine-tuning, thus it can better benefit the few-shot performance. Experimental results demonstrate the effectiveness of the proposed method over bert-tagger and template-based method under few-shot settings. Moreover, the decoding speed of the proposed method is up to 1930.12 times faster than the template-based method.

pdf bib
Robust Lottery Tickets for Pre-trained Language Models
Rui Zheng | Bao Rong | Yuhao Zhou | Di Liang | Sirui Wang | Wei Wu | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent works on Lottery Ticket Hypothesis have shown that pre-trained language models (PLMs) contain smaller matching subnetworks(winning tickets) which are capable of reaching accuracy comparable to the original models. However, these tickets are proved to be notrobust to adversarial examples, and even worse than their PLM counterparts. To address this problem, we propose a novel method based on learning binary weight masks to identify robust tickets hidden in the original PLMs. Since the loss is not differentiable for the binary mask, we assign the hard concrete distribution to the masks and encourage their sparsity using a smoothing approximation of L0 regularization.Furthermore, we design an adversarial loss objective to guide the search for robust tickets and ensure that the tickets perform well bothin accuracy and robustness. Experimental results show the significant improvement of the proposed method over previous work on adversarial robustness evaluation.

pdf bib
MINER: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective
Xiao Wang | Shihan Dou | Limao Xiong | Yicheng Zou | Qi Zhang | Tao Gui | Liang Qiao | Zhanzhan Cheng | Xuanjing Huang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

NER model has achieved promising performance on standard NER benchmarks. However, recent studies show that previous approaches may over-rely on entity mention information, resulting in poor performance on out-of-vocabulary(OOV) entity recognition. In this work, we propose MINER, a novel NER learning framework, to remedy this issue from an information-theoretic perspective. The proposed approach contains two mutual information based training objectives: i) generalizing information maximization, which enhances representation via deep understanding of context and entity surface forms; ii) superfluous information minimization, which discourages representation from rotate memorizing entity names or exploiting biased cues in data. Experiments on various settings and datasets demonstrate that it achieves better performance in predicting OOV entities.

pdf bib
Flooding-X: Improving BERT’s Resistance to Adversarial Attacks via Loss-Restricted Fine-Tuning
Qin Liu | Rui Zheng | Bao Rong | Jingyi Liu | ZhiHua Liu | Zhanzhan Cheng | Liang Qiao | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Adversarial robustness has attracted much attention recently, and the mainstream solution is adversarial training. However, the tradition of generating adversarial perturbations for each input embedding (in the settings of NLP) scales up the training computational complexity by the number of gradient steps it takes to obtain the adversarial samples. To address this problem, we leverage Flooding method which primarily aims at better generalization and we find promising in defending adversarial attacks. We further propose an effective criterion to bring hyper-parameter-dependent flooding into effect with a narrowed-down search space by measuring how the gradient steps taken within one epoch affect the loss of each batch. Our approach requires zero adversarial sample for training, and its time consumption is equivalent to fine-tuning, which can be 2-15 times faster than standard adversarial training. We experimentally show that our method improves BERT’s resistance to textual adversarial attacks by a large margin, and achieves state-of-the-art robust accuracy on various text classification and GLUE tasks.

pdf bib
CQG: A Simple and Effective Controlled Generation Framework for Multi-hop Question Generation
Zichu Fei | Qi Zhang | Tao Gui | Di Liang | Sirui Wang | Wei Wu | Xuanjing Huang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multi-hop question generation focuses on generating complex questions that require reasoning over multiple pieces of information of the input passage. Current models with state-of-the-art performance have been able to generate the correct questions corresponding to the answers. However, most models can not ensure the complexity of generated questions, so they may generate shallow questions that can be answered without multi-hop reasoning. To address this challenge, we propose the CQG, which is a simple and effective controlled framework. CQG employs a simple method to generate the multi-hop questions that contain key entities in multi-hop reasoning chains, which ensure the complexity and quality of the questions. In addition, we introduce a novel controlled Transformer-based decoder to guarantee that key entities appear in the questions. Experiment results show that our model greatly improves performance, which also outperforms the state-of-the-art model about 25% by 5 BLEU points on HotpotQA.

2021

pdf bib
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Marie-Francine Moens | Xuanjing Huang | Lucia Specia | Scott Wen-tau Yih
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

pdf bib
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
Ruize Wang | Duyu Tang | Nan Duan | Zhongyu Wei | Xuanjing Huang | Jianshu Ji | Guihong Cao | Daxin Jiang | Ming Zhou
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Attending via both Fine-tuning and Compressing
Jie Zhou | Yuanbin Wu | Qin Chen | Xuanjing Huang | Liang He
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Findings of the Association for Computational Linguistics: EMNLP 2021
Marie-Francine Moens | Xuanjing Huang | Lucia Specia | Scott Wen-tau Yih
Findings of the Association for Computational Linguistics: EMNLP 2021

pdf bib
Accelerating BERT Inference for Sequence Labeling via Early-Exit
Xiaonan Li | Yunfan Shao | Tianxiang Sun | Hang Yan | Xipeng Qiu | Xuanjing Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Both performance and efficiency are crucial factors for sequence labeling tasks in many real-world scenarios. Although the pre-trained models (PTMs) have significantly improved the performance of various sequence labeling tasks, their computational cost is expensive. To alleviate this problem, we extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks. However, existing early-exit mechanisms are specifically designed for sequence-level tasks, rather than sequence labeling. In this paper, we first propose a simple extension of sentence-level early-exit for sequence labeling tasks. To further reduce the computational cost, we also propose a token-level early-exit mechanism that allows partial tokens to exit early at different layers. Considering the local dependency inherent in sequence labeling, we employed a window-based criterion to decide for a token whether or not to exit. The token-level early-exit brings the gap between training and inference, so we introduce an extra self-sampling fine-tuning stage to alleviate it. The extensive experiments on three popular sequence labeling tasks show that our approach can save up to 66%∼75% inference cost with minimal performance degradation. Compared with competitive compressed models such as DistilBERT, our approach can achieve better performance under the same speed-up ratios of 2×, 3×, and 4×.

pdf bib
Align Voting Behavior with Public Statements for Legislator Representation Learning
Xinyi Mou | Zhongyu Wei | Lei Chen | Shangyi Ning | Yancheng He | Changjian Jiang | Xuanjing Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Ideology of legislators is typically estimated by ideal point models from historical records of votes. It represents legislators and legislation as points in a latent space and shows promising results for modeling voting behavior. However, it fails to capture more specific attitudes of legislators toward emerging issues and is unable to model newly-elected legislators without voting histories. In order to mitigate these two problems, we explore to incorporate both voting behavior and public statements on Twitter to jointly model legislators. In addition, we propose a novel task, namely hashtag usage prediction to model the ideology of legislators on Twitter. In practice, we construct a heterogeneous graph for the legislative context and use relational graph neural networks to learn the representation of legislators with the guidance of historical records of their voting and hashtag usage. Experiment results indicate that our model yields significant improvements for the task of roll call vote prediction. Further analysis further demonstrates that legislator representation we learned captures nuances in statements.

pdf bib
Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble
Yi Zhou | Xiaoqing Zheng | Cho-Jui Hsieh | Kai-Wei Chang | Xuanjing Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Although deep neural networks have achieved prominent performance on many NLP tasks, they are vulnerable to adversarial examples. We propose Dirichlet Neighborhood Ensemble (DNE), a randomized method for training a robust model to defense synonym substitution-based attacks. During training, DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. In such a way, the model is robust to adversarial attacks while maintaining the performance on the original clean data. DNE is agnostic to the network architectures and scales to large models (e.g., BERT) for NLP applications. Through extensive experimentation, we demonstrate that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.

pdf bib
Math Word Problem Solving with Explicit Numerical Values
Qinzhuo Wu | Qi Zhang | Zhongyu Wei | Xuanjing Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In recent years, math word problem solving has received considerable attention and achieved promising results, but previous methods rarely take numerical values into consideration. Most methods treat the numerical values in the problems as number symbols, and ignore the prominent role of the numerical values in solving the problem. In this paper, we propose a novel approach called NumS2T, which enhances math word problem solving performance by explicitly incorporating numerical values into a sequence-to-tree network. In addition, a numerical properties prediction mechanism is used to capture the category and comparison information of numerals and measure their importance in global expressions. Experimental results on the Math23K and APE datasets demonstrate that our model achieves better performance than existing state-of-the-art models.

pdf bib
SENT: Sentence-level Distant Relation Extraction via Negative Training
Ruotian Ma | Tao Gui | Linyang Li | Qi Zhang | Xuanjing Huang | Yaqian Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Distant supervision for relation extraction provides uniform bag labels for each sentence inside the bag, while accurate sentence labels are important for downstream applications that need the exact relation type. Directly using bag labels for sentence-level training will introduce much noise, thus severely degrading performance. In this work, we propose the use of negative training (NT), in which a model is trained using complementary labels regarding that “the instance does not belong to these complementary labels”. Since the probability of selecting a true label as a complementary label is low, NT provides less noisy information. Furthermore, the model trained with NT is able to separate the noisy data from the training data. Based on NT, we propose a sentence-level framework, SENT, for distant relation extraction. SENT not only filters the noisy data to construct a cleaner dataset, but also performs a re-labeling process to transform the noisy data into useful training data, thus further benefiting the model’s performance. Experimental results show the significant improvement of the proposed method over previous methods on sentence-level evaluation and de-noise effect.

pdf bib
SpanNER: Named Entity Re-/Recognition as Span Prediction
Jinlan Fu | Xuanjing Huang | Pengfei Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recent years have seen the paradigm shift of Named Entity Recognition (NER) systems from sequence labeling to span prediction. Despite its preliminary effectiveness, the span prediction model’s architectural bias has not been fully understood. In this paper, we first investigate the strengths and weaknesses when the span prediction model is used for named entity recognition compared with the sequence labeling framework and how to further improve it, which motivates us to make complementary advantages of systems based on different paradigms. We then reveal that span prediction, simultaneously, can serve as a system combiner to re-recognize named entities from different systems’ outputs. We experimentally implement 154 systems on 11 datasets, covering three languages, comprehensive results show the effectiveness of span prediction models that both serve as base NER systems and system combiners. We make all codes and datasets available: https://github.com/neulab/spanner, as well as an online system demo: http://spanner.sh. Our model also has been deployed into the ExplainaBoard platform, which allows users to flexibly perform a system combination of top-scoring systems in an interactive way: http://explainaboard.nlpedia.ai/leaderboard/task-ner/.

pdf bib
Exploration and Exploitation: Two Ways to Improve Chinese Spelling Correction Models
Chong Li | Cenyuan Zhang | Xiaoqing Zheng | Xuanjing Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

A sequence-to-sequence learning with neural networks has empirically proven to be an effective framework for Chinese Spelling Correction (CSC), which takes a sentence with some spelling errors as input and outputs the corrected one. However, CSC models may fail to correct spelling errors covered by the confusion sets, and also will encounter unseen ones. We propose a method, which continually identifies the weak spots of a model to generate more valuable training instances, and apply a task-specific pre-training strategy to enhance the model. The generated adversarial examples are gradually added to the training set. Experimental results show that such an adversarial training method combined with the pre-training strategy can improve both the generalization and robustness of multiple CSC models across three different datasets, achieving state-of-the-art performance for CSC task.

pdf bib
fastHan: A BERT-based Multi-Task Toolkit for Chinese NLP
Zhichao Geng | Hang Yan | Xipeng Qiu | Xuanjing Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

We present fastHan, an open-source toolkit for four basic tasks in Chinese natural language processing: Chinese word segmentation (CWS), Part-of-Speech (POS) tagging, named entity recognition (NER), and dependency parsing. The backbone of fastHan is a multi-task model based on a pruned BERT, which uses the first 8 layers in BERT. We also provide a 4-layer base model compressed from the 8-layer model. The joint-model is trained and evaluated on 13 corpora of four tasks, yielding near state-of-the-art (SOTA) performance in dependency parsing and NER, achieving SOTA performance in CWS and POS. Besides, fastHan’s transferability is also strong, performing much better than popular segmentation tools on a non-training corpus. To better meet the need of practical application, we allow users to use their own labeled data to further fine-tune fastHan. In addition to its small size and excellent performance, fastHan is user-friendly. Implemented as a python package, fastHan isolates users from the internal technical details and is convenient to use. The project is released on Github.

pdf bib
TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
Xiao Wang | Qin Liu | Tao Gui | Qi Zhang | Yicheng Zou | Xin Zhou | Jiacheng Ye | Yongxin Zhang | Rui Zheng | Zexiong Pang | Qinzhuo Wu | Zhengyan Li | Chong Zhang | Ruotian Ma | Zichu Fei | Ruijian Cai | Jun Zhao | Xingwu Hu | Zhiheng Yan | Yiding Tan | Yuan Hu | Qiyuan Bian | Zhihua Liu | Shan Qin | Bolin Zhu | Xiaoyu Xing | Jinlan Fu | Yue Zhang | Minlong Peng | Xiaoqing Zheng | Yaqian Zhou | Zhongyu Wei | Xipeng Qiu | Xuanjing Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

TextFlint is a multilingual robustness evaluation toolkit for NLP tasks that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analyses. This enables practitioners to automatically evaluate their models from various aspects or to customize their evaluations as desired with just a few lines of code. TextFlint also generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model in terms of its robustness. To guarantee acceptability, all the text transformations are linguistically based and all the transformed data selected (up to 100,000 texts) scored highly under human evaluation. To validate the utility, we performed large-scale empirical evaluations (over 67,000) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. The toolkit is already available at https://github.com/textflint with all the evaluation results demonstrated at textflint.io.

pdf bib
Larger-Context Tagging: When and Why Does It Work?
Jinlan Fu | Liangjing Feng | Qi Zhang | Xuanjing Huang | Pengfei Liu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The development of neural networks and pretraining techniques has spawned many sentence-level tagging systems that achieved superior performance on typical benchmarks. However, a relatively less discussed topic is what if more context information is introduced into current top-scoring tagging systems. Although several existing works have attempted to shift tagging systems from sentence-level to document-level, there is still no consensus conclusion about when and why it works, which limits the applicability of the larger-context approach in tagging tasks. In this paper, instead of pursuing a state-of-the-art tagging system by architectural exploration, we focus on investigating when and why the larger-context training, as a general strategy, can work. To this end, we conduct a thorough comparative study on four proposed aggregators for context information collecting and present an attribute-aided evaluation method to interpret the improvement brought by larger-context training. Experimentally, we set up a testbed based on four tagging tasks and thirteen datasets. Hopefully, our preliminary observations can deepen the understanding of larger-context training and enlighten more follow-up works on the use of contextual information.

pdf bib
Mask Attention Networks: Rethinking and Strengthen Transformer
Zhihao Fan | Yeyun Gong | Dayiheng Liu | Zhongyu Wei | Siyuan Wang | Jian Jiao | Nan Duan | Ruofei Zhang | Xuanjing Huang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Transformer is an attention-based neural network, which consists of two sublayers, namely, Self-Attention Network (SAN) and Feed-Forward Network (FFN). Existing research explores to enhance the two sublayers separately to improve the capability of Transformer for text representation. In this paper, we present a novel understanding of SAN and FFN as Mask Attention Networks (MANs) and show that they are two special cases of MANs with static mask matrices. However, their static mask matrices limit the capability for localness modeling in text representation learning. We therefore introduce a new layer named dynamic mask attention network (DMAN) with a learnable mask matrix which is able to model localness adaptively. To incorporate advantages of DMAN, SAN, and FFN, we propose a sequential layered structure to combine the three types of layers. Extensive experiments on various tasks, including neural machine translation and text summarization demonstrate that our model outperforms the original Transformer.

pdf bib
Discrete Argument Representation Learning for Interactive Argument Pair Identification
Lu Ji | Zhongyu Wei | Jing Li | Qi Zhang | Xuanjing Huang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In this paper, we focus on identifying interactive argument pairs from two posts with opposite stances to a certain topic. Considering opinions are exchanged from different perspectives of the discussing topic, we study the discrete representations for arguments to capture varying aspects in argumentation languages (e.g., the debate focus and the participant behavior). Moreover, we utilize hierarchical structure to model post-wise information incorporating contextual knowledge. Experimental results on the large-scale dataset collected from CMV show that our proposed framework can significantly outperform the competitive baselines. Further analyses reveal why our model yields superior performance and prove the usefulness of our learned representations.

2020

pdf bib
Simplify the Usage of Lexicon in Chinese NER
Ruotian Ma | Minlong Peng | Qi Zhang | Zhongyu Wei | Xuanjing Huang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recently, many works have tried to augment the performance of Chinese named entity recognition (NER) using word lexicons. As a representative, Lattice-LSTM has achieved new benchmark results on several public Chinese NER datasets. However, Lattice-LSTM has a complex model architecture. This limits its application in many industrial areas where real-time NER responses are needed. In this work, we propose a simple but effective method for incorporating the word lexicon into the character representations. This method avoids designing a complicated sequence modeling architecture, and for any neural NER model, it requires only subtle adjustment of the character representation layer to introduce the lexicon information. Experimental studies on four benchmark Chinese NER datasets show that our method achieves an inference speed up to 6.15 times faster than those of state-of-the-art methods, along with a better performance. The experimental results also show that the proposed method can be easily incorporated with pre-trained models like BERT.

pdf bib
Extractive Summarization as Text Matching
Ming Zhong | Pengfei Liu | Yiran Chen | Danqing Wang | Xipeng Qiu | Xuanjing Huang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems. Instead of following the commonly used framework of extracting sentences individually and modeling the relationship between sentences, we formulate the extractive summarization task as a semantic text matching problem, in which a source document and candidate summaries will be (extracted from the original text) matched in a semantic space. Notably, this paradigm shift to semantic matching framework is well-grounded in our comprehensive analysis of the inherent gap between sentence-level and summary-level extractors based on the property of the dataset. Besides, even instantiating the framework with a simple form of a matching model, we have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1). Experiments on the other five datasets also show the effectiveness of the matching framework. We believe the power of this matching-based summarization framework has not been fully exploited. To encourage more instantiations in the future, we have released our codes, processed dataset, as well as generated summaries in https://github.com/maszhongming/MatchSum.

pdf bib
Heterogeneous Graph Neural Networks for Extractive Document Summarization
Danqing Wang | Pengfei Liu | Yining Zheng | Xipeng Qiu | Xuanjing Huang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

As a crucial step in extractive document summarization, learning cross-sentence relations has been explored by a plethora of approaches. An intuitive way is to put them in the graph-based neural network, which has a more complex structure for capturing inter-sentence relationships. In this paper, we present a heterogeneous graph-based neural network for extractive summarization (HETERSUMGRAPH), which contains semantic nodes of different granularity levels apart from sentences. These additional nodes act as the intermediary between sentences and enrich the cross-sentence relations. Besides, our graph structure is flexible in natural extension from a single-document setting to multi-document via introducing document nodes. To our knowledge, we are the first one to introduce different types of nodes into graph-based neural networks for extractive document summarization and perform a comprehensive qualitative analysis to investigate their benefits. The code will be released on Github.

pdf bib
Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples
Xiaoqing Zheng | Jiehang Zeng | Yi Zhou | Cho-Jui Hsieh | Minhao Cheng | Xuanjing Huang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Despite achieving prominent performance on many important tasks, it has been reported that neural networks are vulnerable to adversarial examples. Previously studies along this line mainly focused on semantic tasks such as sentiment analysis, question answering and reading comprehension. In this study, we show that adversarial examples also exist in dependency parsing: we propose two approaches to study where and how parsers make mistakes by searching over perturbations to existing texts at sentence and phrase levels, and design algorithms to construct such examples in both of the black-box and white-box settings. Our experiments with one of state-of-the-art parsers on the English Penn Treebank (PTB) show that up to 77% of input examples admit adversarial perturbations, and we also show that the robustness of parsing models can be improved by crafting high-quality adversaries and including them in the training stage, while suffering little to no performance drop on the clean input data.

pdf bib
FLAT: Chinese NER Using Flat-Lattice Transformer
Xiaonan Li | Hang Yan | Xipeng Qiu | Xuanjing Huang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recently, the character-word lattice structure has been proved to be effective for Chinese named entity recognition (NER) by incorporating the word information. However, since the lattice structure is complex and dynamic, the lattice-based models are hard to fully utilize the parallel computation of GPUs and usually have a low inference speed. In this paper, we propose FLAT: Flat-LAttice Transformer for Chinese NER, which converts the lattice structure into a flat structure consisting of spans. Each span corresponds to a character or latent word and its position in the original lattice. With the power of Transformer and well-designed position encoding, FLAT can fully leverage the lattice information and has an excellent parallel ability. Experiments on four datasets show FLAT outperforms other lexicon-based models in performance and efficiency.

pdf bib
A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing
Hang Yan | Xipeng Qiu | Xuanjing Huang
Transactions of the Association for Computational Linguistics, Volume 8

Chinese word segmentation and dependency parsing are two fundamental tasks for Chinese natural language processing. The dependency parsing is defined at the word-level. Therefore word segmentation is the precondition of dependency parsing, which makes dependency parsing suffer from error propagation and unable to directly make use of character-level pre-trained language models (such as BERT). In this paper, we propose a graph-based model to integrate Chinese word segmentation and dependency parsing. Different from previous transition-based joint models, our proposed model is more concise, which results in fewer efforts of feature engineering. Our graph-based joint model achieves better performance than previous joint models and state-of-the-art results in both Chinese word segmentation and dependency parsing. Additionally, when BERT is combined, our model can substantially reduce the performance gap of dependency parsing between joint models and gold-segmented word-based models. Our code is publicly available at https://github.com/fastnlp/JointCwsParser

pdf bib
An Enhanced Knowledge Injection Model for Commonsense Generation
Zhihao Fan | Yeyun Gong | Zhongyu Wei | Siyuan Wang | Yameng Huang | Jian Jiao | Xuanjing Huang | Nan Duan | Ruofei Zhang
Proceedings of the 28th International Conference on Computational Linguistics

Commonsense generation aims at generating plausible everyday scenario description based on a set of provided concepts. Digging the relationship of concepts from scratch is non-trivial, therefore, we retrieve prototypes from external knowledge to assist the understanding of the scenario for better description generation. We integrate two additional modules into the pretrained encoder-decoder model for prototype modeling to enhance the knowledge injection procedure. We conduct experiment on CommonGen benchmark, experimental results show that our method significantly improves the performance on all the metrics.

pdf bib
Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication
Ruize Wang | Zhongyu Wei | Ying Cheng | Piji Li | Haijun Shan | Ji Zhang | Qi Zhang | Xuanjing Huang
Proceedings of the 28th International Conference on Computational Linguistics

Visual storytelling aims to generate a narrative paragraph from a sequence of images automatically. Existing approaches construct text description independently for each image and roughly concatenate them as a story, which leads to the problem of generating semantically incoherent content. In this paper, we propose a new way for visual storytelling by introducing a topic description task to detect the global semantic context of an image stream. A story is then constructed with the guidance of the topic description. In order to combine the two generation tasks, we propose a multi-agent communication framework that regards the topic description generator and the story generator as two agents and learn them simultaneously via iterative updating mechanism. We validate our approach on VIST dataset, where quantitative results, ablations, and human evaluation demonstrate our method’s good ability in generating stories with higher quality compared to state-of-the-art methods.

pdf bib
CoLAKE: Contextualized Language and Knowledge Embedding
Tianxiang Sun | Yunfan Shao | Xipeng Qiu | Qipeng Guo | Yaru Hu | Xuanjing Huang | Zheng Zhang
Proceedings of the 28th International Conference on Computational Linguistics

With the emerging branch of incorporating factual knowledge into pre-trained language models such as BERT, most existing models consider shallow, static, and separately pre-trained entity embeddings, which limits the performance gains of these models. Few works explore the potential of deep contextualized knowledge representation when injecting knowledge. In this paper, we propose the Contextualized Language and Knowledge Embedding (CoLAKE), which jointly learns contextualized representation for both language and knowledge with the extended MLM objective. Instead of injecting only entity embeddings, CoLAKE extracts the knowledge context of an entity from large-scale knowledge bases. To handle the heterogeneity of knowledge context and language context, we integrate them in a unified data structure, word-knowledge graph (WK graph). CoLAKE is pre-trained on large-scale WK graphs with the modified Transformer encoder. We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks. Experimental results show that CoLAKE outperforms previous counterparts on most of the tasks. Besides, CoLAKE achieves surprisingly high performance on our synthetic task called word-knowledge graph completion, which shows the superiority of simultaneously contextualizing language and knowledge representation.

pdf bib
Modeling Evolution of Message Interaction for Rumor Resolution
Lei Chen | Zhongyu Wei | Jing Li | Baohua Zhou | Qi Zhang | Xuanjing Huang
Proceedings of the 28th International Conference on Computational Linguistics

Previous work for rumor resolution concentrates on exploiting time-series characteristics or modeling topology structure separately. However, how local interactive pattern affects global information assemblage has not been explored. In this paper, we attempt to address the problem by learning evolution of message interaction. We model confrontation and reciprocity between message pairs via discrete variational autoencoders which effectively reflects the diversified opinion interactivity. Moreover, we capture the variation of message interaction using a hierarchical framework to better integrate information flow of a rumor cascade. Experiments on PHEME dataset demonstrate our proposed model achieves higher accuracy than existing methods.

pdf bib
Toward Recognizing More Entity Types in NER: An Efficient Implementation using Only Entity Lexicons
Minlong Peng | Ruotian Ma | Qi Zhang | Lujun Zhao | Mengxi Wei | Changlong Sun | Xuanjing Huang
Findings of the Association for Computational Linguistics: EMNLP 2020

In this work, we explore the way to quickly adjust an existing named entity recognition (NER) system to make it capable of recognizing entity types not defined in the system. As an illustrative example, consider the case that a NER system has been built to recognize person and organization names, and now it requires to additionally recognize job titles. Such a situation is common in the industrial areas, where the entity types required to recognize vary a lot in different products and keep changing. To avoid laborious data labeling and achieve fast adaptation, we propose to adjust the existing NER system using the previously labeled data and entity lexicons of the newly introduced entity types. We formulate such a task as a partially supervised learning problem and accordingly propose an effective algorithm to solve the problem. Comprehensive experimental studies on several public NER datasets validate the effectiveness of our method.

pdf bib
A Concise Model for Multi-Criteria Chinese Word Segmentation with Transformer Encoder
Xipeng Qiu | Hengzhi Pei | Hang Yan | Xuanjing Huang
Findings of the Association for Computational Linguistics: EMNLP 2020

Multi-criteria Chinese word segmentation (MCCWS) aims to exploit the relations among the multiple heterogeneous segmentation criteria and further improve the performance of each single criterion. Previous work usually regards MCCWS as different tasks, which are learned together under the multi-task learning framework. In this paper, we propose a concise but effective unified model for MCCWS, which is fully-shared for all the criteria. By leveraging the powerful ability of the Transformer encoder, the proposed unified model can segment Chinese text according to a unique criterion-token indicating the output criterion. Besides, the proposed unified model can segment both simplified and traditional Chinese and has an excellent transfer capability. Experiments on eight datasets with different criteria show that our model outperforms our single-criterion baseline model and other multi-criteria models. Source codes of this paper are available on Github.

pdf bib
Cross-Lingual Dependency Parsing by POS-Guided Word Reordering
Lu Liu | Yi Zhou | Jianhan Xu | Xiaoqing Zheng | Kai-Wei Chang | Xuanjing Huang
Findings of the Association for Computational Linguistics: EMNLP 2020

We propose a novel approach to cross-lingual dependency parsing based on word reordering. The words in each sentence of a source language corpus are rearranged to meet the word order in a target language under the guidance of a part-of-speech based language model (LM). To obtain the highest reordering score under the LM, a population-based optimization algorithm and its genetic operators are designed to deal with the combinatorial nature of such word reordering. A parser trained on the reordered corpus then can be used to parse sentences in the target language. We demonstrate through extensive experimentation that our approach achieves better or comparable results across 25 target languages (1.73% increase in average), and outperforms a baseline by a significant margin on the languages that are greatly different from the source one. For example, when transferring the English parser to Hindi and Latin, our approach outperforms the baseline by 15.3% and 6.7% respectively.

pdf bib
CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural Summarization Systems
Yiran Chen | Pengfei Liu | Ming Zhong | Zi-Yi Dou | Danqing Wang | Xipeng Qiu | Xuanjing Huang
Findings of the Association for Computational Linguistics: EMNLP 2020

Neural network-based models augmented with unsupervised pre-trained knowledge have achieved impressive performance on text summarization. However, most existing evaluation methods are limited to an in-domain setting, where summarizers are trained and evaluated on the same dataset. We argue that this approach can narrow our understanding of the generalization ability for different summarization systems. In this paper, we perform an in-depth analysis of characteristics of different datasets and investigate the performance of different summarization models under a cross-dataset setting, in which a summarizer trained on one corpus will be evaluated on a range of out-of-domain corpora. A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways (i.e. abstractive and extractive) on model generalization ability. Further, experimental results shed light on the limitations of existing summarizers. Brief introduction and supplementary code can be found in https://github.com/zide05/CDEvalSumm.

pdf bib
Automatic Term Name Generation for Gene Ontology: Task and Dataset
Yanjian Zhang | Qin Chen | Yiteng Zhang | Zhongyu Wei | Yixu Gao | Jiajie Peng | Zengfeng Huang | Weijian Sun | Xuanjing Huang
Findings of the Association for Computational Linguistics: EMNLP 2020

Terms contained in Gene Ontology (GO) have been widely used in biology and bio-medicine. Most previous research focuses on inferring new GO terms, while the term names that reflect the gene function are still named by the experts. To fill this gap, we propose a novel task, namely term name generation for GO, and build a large-scale benchmark dataset. Furthermore, we present a graph-based generative model that incorporates the relations between genes, words and terms for term name generation, which exhibits great advantages over the strong baselines.

pdf bib
Uncertainty-Aware Label Refinement for Sequence Labeling
Tao Gui | Jiacheng Ye | Qi Zhang | Zhengyan Li | Zichu Fei | Yeyun Gong | Xuanjing Huang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Conditional random fields (CRF) for label decoding has become ubiquitous in sequence labeling tasks. However, the local label dependencies and inefficient Viterbi decoding have always been a problem to be solved. In this work, we introduce a novel two-stage label decoding framework to model long-term label dependencies, while being much more computationally efficient. A base model first predicts draft labels, and then a novel two-stream self-attention model makes refinements on these draft predictions based on long-range label dependencies, which can achieve parallel decoding for a faster prediction. In addition, in order to mitigate the side effects of incorrect draft labels, Bayesian neural networks are used to indicate the labels with a high probability of being wrong, which can greatly assist in preventing error propagation. The experimental results on three sequence labeling benchmarks demonstrated that the proposed method not only outperformed the CRF-based methods but also greatly accelerated the inference process.

pdf bib
Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis
Xiaoyu Xing | Zhijing Jin | Di Jin | Bingning Wang | Qi Zhang | Xuanjing Huang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Aspect-based sentiment analysis (ABSA) aims to predict the sentiment towards a specific aspect in the text. However, existing ABSA test sets cannot be used to probe whether a model can distinguish the sentiment of the target aspect from the non-target aspects. To solve this problem, we develop a simple but effective approach to enrich ABSA test sets. Specifically, we generate new examples to disentangle the confounding sentiments of the non-target aspects from the target aspect’s sentiment. Based on the SemEval 2014 dataset, we construct the Aspect Robustness Test Set (ARTS) as a comprehensive probe of the aspect robustness of ABSA models. Over 92% data of ARTS show high fluency and desired sentiment on all aspects by human evaluation. Using ARTS, we analyze the robustness of nine ABSA models, and observe, surprisingly, that their accuracy drops by up to 69.73%. We explore several ways to improve aspect robustness, and find that adversarial training can improve models’ performance on ARTS by up to 32.85%. Our code and new test set are available at https://github.com/zhijing-jin/ARTS_TestSet

pdf bib
Leveraging Declarative Knowledge in Text and First-Order Logic for Fine-Grained Propaganda Detection
Ruize Wang | Duyu Tang | Nan Duan | Wanjun Zhong | Zhongyu Wei | Xuanjing Huang | Daxin Jiang | Ming Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We study the detection of propagandistic text fragments in news articles. Instead of merely learning from input-output datapoints in training data, we introduce an approach to inject declarative knowledge of fine-grained propaganda techniques. Specifically, we leverage the declarative knowledge expressed in both first-order logic and natural language. The former refers to the logical consistency between coarse- and fine-grained predictions, which is used to regularize the training process with propositional Boolean expressions. The latter refers to the literal definition of each propaganda technique, which is utilized to get class representations for regularizing the model parameters. We conduct experiments on Propaganda Techniques Corpus, a large manually annotated dataset for fine-grained propaganda detection. Experiments show that our method achieves superior performance, demonstrating that leveraging declarative knowledge can help the model to make more accurate predictions.

pdf bib
RethinkCWS: Is Chinese Word Segmentation a Solved Task?
Jinlan Fu | Pengfei Liu | Qi Zhang | Xuanjing Huang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The performance of the Chinese Word Segmentation (CWS) systems has gradually reached a plateau with the rapid development of deep neural networks, especially the successful use of large pre-trained models. In this paper, we take stock of what we have achieved and rethink what’s left in the CWS task. Methodologically, we propose a fine-grained evaluation for existing CWS systems, which not only allows us to diagnose the strengths and weaknesses of existing models (under the in-dataset setting), but enables us to quantify the discrepancy between different criterion and alleviate the negative transfer problem when doing multi-criteria learning. Strategically, despite not aiming to propose a novel model in this paper, our comprehensive experiments on eight models and seven datasets, as well as thorough analysis, could search for some promising direction for future research. We make all codes publicly available and release an interface that can quickly evaluate and diagnose user’s models: https://github.com/neulab/InterpretEval

pdf bib
A Knowledge-Aware Sequence-to-Tree Network for Math Word Problem Solving
Qinzhuo Wu | Qi Zhang | Jinlan Fu | Xuanjing Huang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

With the advancements in natural language processing tasks, math word problem solving has received increasing attention. Previous methods have achieved promising results but ignore background common-sense knowledge not directly provided by the problem. In addition, during generation, they focus on local features while neglecting global information. To incorporate external knowledge and global expression information, we propose a novel knowledge-aware sequence-to-tree (KA-S2T) network in which the entities in the problem sequences and their categories are modeled as an entity graph. Based on this entity graph, a graph attention network is used to capture knowledge-aware problem representations. Further, we use a tree-structured decoder with a state aggregation mechanism to capture the long-distance dependency and global expression information. Experimental results on the Math23K dataset revealed that the KA-S2T model can achieve better performance than previously reported best results.

pdf bib
PathQG: Neural Question Generation from Facts
Siyuan Wang | Zhongyu Wei | Zhihao Fan | Zengfeng Huang | Weijian Sun | Qi Zhang | Xuanjing Huang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing research for question generation encodes the input text as a sequence of tokens without explicitly modeling fact information. These models tend to generate irrelevant and uninformative questions. In this paper, we explore to incorporate facts in the text for question generation in a comprehensive way. We present a novel task of question generation given a query path in the knowledge graph constructed from the input text. We divide the task into two steps, namely, query representation learning and query-based question generation. We formulate query representation learning as a sequence labeling problem for identifying the involved facts to form a query and employ an RNN-based generator for question generation. We first train the two modules jointly in an end-to-end fashion, and further enforce the interaction between these two modules in a variational framework. We construct the experimental datasets on top of SQuAD and results show that our model outperforms other state-of-the-art approaches, and the performance margin is larger when target questions are complex. Human evaluation also proves that our model is able to generate relevant and informative questions.

2019

pdf bib
A Lexicon-Based Graph Neural Network for Chinese NER
Tao Gui | Yicheng Zou | Qi Zhang | Minlong Peng | Jinlan Fu | Zhongyu Wei | Xuanjing Huang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Recurrent neural networks (RNN) used for Chinese named entity recognition (NER) that sequentially track character and word information have achieved great success. However, the characteristic of chain structure and the lack of global semantics determine that RNN-based models are vulnerable to word ambiguities. In this work, we try to alleviate this problem by introducing a lexicon-based graph neural network with global semantics, in which lexicon knowledge is used to connect characters to capture the local composition, while a global relay node can capture global sentence semantics and long-range dependency. Based on the multiple graph-based interactions among characters, potential words, and the whole-sentence semantics, word ambiguities can be effectively tackled. Experiments on four NER datasets show that the proposed model achieves significant improvements against other baseline models.

pdf bib
Asynchronous Deep Interaction Network for Natural Language Inference
Di Liang | Fubao Zhang | Qi Zhang | Xuanjing Huang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Natural language inference aims to predict whether a premise sentence can infer another hypothesis sentence. Existing methods typically have framed the reasoning problem as a semantic matching task. The both sentences are encoded and interacted symmetrically and in parallel. However, in the process of reasoning, the role of the two sentences is obviously different, and the sentence pairs for NLI are asymmetrical corpora. In this paper, we propose an asynchronous deep interaction network (ADIN) to complete the task. ADIN is a neural network structure stacked with multiple inference sub-layers, and each sub-layer consists of two local inference modules in an asymmetrical manner. Different from previous methods, this model deconstructs the reasoning process and implements the asynchronous and multi-step reasoning. Experiment results show that ADIN achieves competitive performance and outperforms strong baselines on three popular benchmarks: SNLI, MultiNLI, and SciTail.

pdf bib
GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge
Luyao Huang | Chi Sun | Xipeng Qiu | Xuanjing Huang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Word Sense Disambiguation (WSD) aims to find the exact sense of an ambiguous word in a particular context. Traditional supervised methods rarely take into consideration the lexical resources like WordNet, which are widely utilized in knowledge-based methods. Recent studies have shown the effectiveness of incorporating gloss (sense definition) into neural networks for WSD. However, compared with traditional word expert supervised methods, they have not achieved much improvement. In this paper, we focus on how to better leverage gloss knowledge in a supervised neural WSD system. We construct context-gloss pairs and propose three BERT based models for WSD. We fine-tune the pre-trained BERT model and achieve new state-of-the-art results on WSD task.

pdf bib
A Closer Look at Data Bias in Neural Extractive Summarization Models
Ming Zhong | Danqing Wang | Pengfei Liu | Xipeng Qiu | Xuanjing Huang
Proceedings of the 2nd Workshop on New Frontiers in Summarization

In this paper, we take stock of the current state of summarization datasets and explore how different factors of datasets influence the generalization behaviour of neural extractive summarization models. Specifically, we first propose several properties of datasets, which matter for the generalization of summarization models. Then we build the connection between priors residing in datasets and model designs, analyzing how different properties of datasets influence the choices of model structure design and training methods. Finally, by taking a typical dataset as an example, we rethink the process of the model design based on the experience of the above analysis. We demonstrate that when we have a deep understanding of the characteristics of datasets, a simple approach can bring significant improvements to the existing state-of-the-art model.

pdf bib
VCWE: Visual Character-Enhanced Word Embeddings
Chi Sun | Xipeng Qiu | Xuanjing Huang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Chinese is a logographic writing system, and the shape of Chinese characters contain rich syntactic and semantic information. In this paper, we propose a model to learn Chinese word embeddings via three-level composition: (1) a convolutional neural network to extract the intra-character compositionality from the visual shape of a character; (2) a recurrent neural network with self-attention to compose character representation into word embeddings; (3) the Skip-Gram framework to capture non-compositionality directly from the contextual information. Evaluations demonstrate the superior performance of our model on four tasks: word similarity, sentiment analysis, named entity recognition and part-of-speech tagging.

pdf bib
Searching for Effective Neural Extractive Summarization: What Works and What’s Next
Ming Zhong | Pengfei Liu | Danqing Wang | Xipeng Qiu | Xuanjing Huang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The recent years have seen remarkable success in the use of deep neural networks on text summarization. However, there is no clear understanding of why they perform so well, or how they might be improved. In this paper, we seek to better understand how neural extractive summarization systems could benefit from different types of model architectures, transferable knowledge and learning schemas. Besides, we find an effective way to improve the current framework and achieve the state-of-the-art result on CNN/DailyMail by a large margin based on our observations and analysis. Hopefully, our work could provide more hints for future research on extractive summarization.

pdf bib
Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning
Minlong Peng | Xiaoyu Xing | Qi Zhang | Jinlan Fu | Xuanjing Huang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this work, we explore the way to perform named entity recognition (NER) using only unlabeled data and named entity dictionaries. To this end, we formulate the task as a positive-unlabeled (PU) learning problem and accordingly propose a novel PU learning algorithm to perform the task. We prove that the proposed algorithm can unbiasedly and consistently estimate the task loss as if there is fully labeled data. A key feature of the proposed method is that it does not require the dictionaries to label every entity within a sentence, and it even does not require the dictionaries to label all of the words constituting an entity. This greatly reduces the requirement on the quality of the dictionaries and makes our method generalize well with quite simple dictionaries. Empirical studies on four public NER datasets demonstrate the effectiveness of our proposed method. We have published the source code at https://github.com/v-mipeng/LexiconNER.

pdf bib
Generating Responses with a Specific Emotion in Dialog
Zhenqiao Song | Xiaoqing Zheng | Lu Liu | Mu Xu | Xuanjing Huang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

It is desirable for dialog systems to have capability to express specific emotions during a conversation, which has a direct, quantifiable impact on improvement of their usability and user satisfaction. After a careful investigation of real-life conversation data, we found that there are at least two ways to express emotions with language. One is to describe emotional states by explicitly using strong emotional words; another is to increase the intensity of the emotional experiences by implicitly combining neutral words in distinct ways. We propose an emotional dialogue system (EmoDS) that can generate the meaningful responses with a coherent structure for a post, and meanwhile express the desired emotion explicitly or implicitly within a unified framework. Experimental results showed EmoDS performed better than the baselines in BLEU, diversity and the quality of emotional expression.

pdf bib
Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation
Ning Dai | Jianze Liang | Xipeng Qiu | Xuanjing Huang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Disentangling the content and style in the latent space is prevalent in unpaired text style transfer. However, two major issues exist in most of the current neural models. 1) It is difficult to completely strip the style information from the semantics for a sentence. 2) The recurrent neural network (RNN) based encoder and decoder, mediated by the latent representation, cannot well deal with the issue of the long-term dependency, resulting in poor preservation of non-stylistic semantic content. In this paper, we propose the Style Transformer, which makes no assumption about the latent representation of source sentence and equips the power of attention mechanism in Transformer to achieve better style transfer and better content preservation.

pdf bib
Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning
Zhihao Fan | Zhongyu Wei | Siyuan Wang | Xuanjing Huang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Image Captioning aims at generating a short description for an image. Existing research usually employs the architecture of CNN-RNN that views the generation as a sequential decision-making process and the entire dataset vocabulary is used as decoding space. They suffer from generating high frequent n-gram with irrelevant words. To tackle this problem, we propose to construct an image-grounded vocabulary, based on which, captions are generated with limitation and guidance. In specific, a novel hierarchical structure is proposed to construct the vocabulary incorporating both visual information and relations among words. For generation, we propose a word-aware RNN cell incorporating vocabulary information into the decoding process directly. Reinforce algorithm is employed to train the generator using constraint vocabulary as action space. Experimental results on MS COCO and Flickr30k show the effectiveness of our framework compared to some state-of-the-art models.

2018

pdf bib
Incorporating Topic Aspects for Online Comment Convincingness Evaluation
Yunfan Gu | Zhongyu Wei | Maoran Xu | Hao Fu | Yang Liu | Xuanjing Huang
Proceedings of the 5th Workshop on Argument Mining

In this paper, we propose to incorporate topic aspects information for online comments convincingness evaluation. Our model makes use of graph convolutional network to utilize implicit topic information within a discussion thread to assist the evaluation of convincingness of each single comment. In order to test the effectiveness of our proposed model, we annotate topic information on top of a public dataset for argument convincingness evaluation. Experimental results show that topic information is able to improve the performance for convincingness evaluation. We also make a move to detect topic aspects automatically.

pdf bib
Automatic Essay Scoring Incorporating Rating Schema via Reinforcement Learning
Yucheng Wang | Zhongyu Wei | Yaqian Zhou | Xuanjing Huang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Automatic essay scoring (AES) is the task of assigning grades to essays without human interference. Existing systems for AES are typically trained to predict the score of each single essay at a time without considering the rating schema. In order to address this issue, we propose a reinforcement learning framework for essay scoring that incorporates quadratic weighted kappa as guidance to optimize the scoring system. Experiment results on benchmark datasets show the effectiveness of our framework.

pdf bib
Convolutional Interaction Network for Natural Language Inference
Jingjing Gong | Xipeng Qiu | Xinchi Chen | Dong Liang | Xuanjing Huang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Attention-based neural models have achieved great success in natural language inference (NLI). In this paper, we propose the Convolutional Interaction Network (CIN), a general model to capture the interaction between two sentences, which can be an alternative to the attention mechanism for NLI. Specifically, CIN encodes one sentence with the filters dynamically generated based on another sentence. Since the filters may be designed to have various numbers and sizes, CIN can capture more complicated interaction patterns. Experiments on three large datasets demonstrate CIN’s efficacy.

pdf bib
Transferring from Formal Newswire Domain with Hypernet for Twitter POS Tagging
Tao Gui | Qi Zhang | Jingjing Gong | Minlong Peng | Di Liang | Keyu Ding | Xuanjing Huang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Part-of-Speech (POS) tagging for Twitter has received considerable attention in recent years. Because most POS tagging methods are based on supervised models, they usually require a large amount of labeled data for training. However, the existing labeled datasets for Twitter are much smaller than those for newswire text. Hence, to help POS tagging for Twitter, most domain adaptation methods try to leverage newswire datasets by learning the shared features between the two domains. However, from a linguistic perspective, Twitter users not only tend to mimic the formal expressions of traditional media, like news, but they also appear to be developing linguistically informal styles. Therefore, POS tagging for the formal Twitter context can be learned together with the newswire dataset, while POS tagging for the informal Twitter context should be learned separately. To achieve this task, in this work, we propose a hypernetwork-based method to generate different parameters to separately model contexts with different expression styles. Experimental results on three different datasets show that our approach achieves better performance than state-of-the-art methods in most cases.

pdf bib
Cross-Domain Sentiment Classification with Target Domain Specific Information
Minlong Peng | Qi Zhang | Yu-gang Jiang | Xuanjing Huang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The task of adopting a model with good performance to a target domain that is different from the source domain used for training has received considerable attention in sentiment analysis. Most existing approaches mainly focus on learning representations that are domain-invariant in both the source and target domains. Few of them pay attention to domain-specific information, which should also be informative. In this work, we propose a method to simultaneously extract domain specific and invariant representations and train a classifier on each of the representation, respectively. And we introduce a few target domain labeled data for learning domain-specific information. To effectively utilize the target domain labeled data, we train the domain invariant representation based classifier with both the source and target domain labeled data and train the domain-specific representation based classifier with only the target domain labeled data. These two classifiers then boost each other in a co-training style. Extensive sentiment analysis experiments demonstrated that the proposed method could achieve better performance than state-of-the-art methods.

pdf bib
Task-oriented Dialogue System for Automatic Diagnosis
Zhongyu Wei | Qianlong Liu | Baolin Peng | Huaixiao Tou | Ting Chen | Xuanjing Huang | Kam-fai Wong | Xiangying Dai
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper, we make a move to build a dialogue system for automatic diagnosis. We first build a dataset collected from an online medical forum by extracting symptoms from both patients’ self-reports and conversational data between patients and doctors. Then we propose a task-oriented dialogue system framework to make diagnosis for patients automatically, which can converse with patients to collect additional symptoms beyond their self-reports. Experimental results on our dataset show that additional symptoms extracted from conversation can greatly improve the accuracy for disease identification and our dialogue system is able to collect these symptoms automatically and make a better diagnosis.

pdf bib
A Lexicon-Based Supervised Attention Model for Neural Sentiment Analysis
Yicheng Zou | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 27th International Conference on Computational Linguistics

Attention mechanisms have been leveraged for sentiment classification tasks because not all words have the same importance. However, most existing attention models did not take full advantage of sentiment lexicons, which provide rich sentiment information and play a critical role in sentiment analysis. To achieve the above target, in this work, we propose a novel lexicon-based supervised attention model (LBSA), which allows a recurrent neural network to focus on the sentiment content, thus generating sentiment-informative representations. Compared with general attention models, our model has better interpretability and less noise. Experimental results on three large-scale sentiment classification datasets showed that the proposed method outperforms previous methods.

pdf bib
A Reinforcement Learning Framework for Natural Question Generation using Bi-discriminators
Zhihao Fan | Zhongyu Wei | Siyuan Wang | Yang Liu | Xuanjing Huang
Proceedings of the 27th International Conference on Computational Linguistics

Visual Question Generation (VQG) aims to ask natural questions about an image automatically. Existing research focus on training model to fit the annotated data set that makes it indifferent from other language generation tasks. We argue that natural questions need to have two specific attributes from the perspectives of content and linguistic respectively, namely, natural and human-written. Inspired by the setting of discriminator in adversarial learning, we propose two discriminators, one for each attribute, to enhance the training. We then use the reinforcement learning framework to incorporate scores from the two discriminators as the reward to guide the training of the question generator. Experimental results on a benchmark VQG dataset show the effectiveness and robustness of our model compared to some state-of-the-art models in terms of both automatic and human evaluation metrics.

pdf bib
Information Aggregation via Dynamic Routing for Sequence Encoding
Jingjing Gong | Xipeng Qiu | Shaojing Wang | Xuanjing Huang
Proceedings of the 27th International Conference on Computational Linguistics

While much progress has been made in how to encode a text sequence into a sequence of vectors, less attention has been paid to how to aggregate these preceding vectors (outputs of RNN/CNN) into fixed-size encoding vector. Usually, a simple max or average pooling is used, which is a bottom-up and passive way of aggregation and lack of guidance by task information. In this paper, we propose an aggregation mechanism to obtain a fixed-size encoding with a dynamic routing policy. The dynamic routing policy is dynamically deciding that what and how much information need be transferred from each word to the final encoding of the text sequence. Following the work of Capsule Network, we design two dynamic routing policies to aggregate the outputs of RNN/CNN encoding layer into a final encoding vector. Compared to the other aggregation methods, dynamic routing can refine the messages according to the state of final encoding vector. Experimental results on five text classification tasks show that our method outperforms other aggregating models by a significant margin. Related source code is released on our github page.Related source code is released on our github page.

pdf bib
Incorporating Argument-Level Interactions for Persuasion Comments Evaluation using Co-attention Model
Lu Ji | Zhongyu Wei | Xiangkun Hu | Yang Liu | Qi Zhang | Xuanjing Huang
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we investigate the issue of persuasiveness evaluation for argumentative comments. Most of the existing research explores different text features of reply comments on word level and ignores interactions between participants. In general, viewpoints are usually expressed by multiple arguments and exchanged on argument level. To better model the process of dialogical argumentation, we propose a novel co-attention mechanism based neural network to capture the interactions between participants on argument level. Experimental results on a publicly available dataset show that the proposed model significantly outperforms some state-of-the-art methods for persuasiveness evaluation. Further analysis reveals that attention weights computed in our model are able to extract interactive argument pairs from the original post and the reply.

2017

pdf bib
Adversarial Multi-task Learning for Text Classification
Pengfei Liu | Xipeng Qiu | Xuanjing Huang
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural network models have shown their promising opportunities for multi-task learning, which focus on learning the shared layers to extract the common and task-invariant features. However, in most existing approaches, the extracted shared features are prone to be contaminated by task-specific features or the noise brought by other tasks. In this paper, we propose an adversarial multi-task learning framework, alleviating the shared and private latent feature spaces from interfering with each other. We conduct extensive experiments on 16 different text classification tasks, which demonstrates the benefits of our approach. Besides, we show that the shared knowledge learned by our proposed model can be regarded as off-the-shelf knowledge and easily transferred to new tasks. The datasets of all 16 tasks are publicly available at http://nlp.fudan.edu.cn/data/.

pdf bib
Adversarial Multi-Criteria Learning for Chinese Word Segmentation
Xinchi Chen | Zhan Shi | Xipeng Qiu | Xuanjing Huang
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Different linguistic perspectives causes many diverse segmentation criteria for Chinese word segmentation (CWS). Most existing methods focus on improve the performance for each single criterion. However, it is interesting to exploit these different criteria and mining their common underlying knowledge. In this paper, we propose adversarial multi-criteria learning for CWS by integrating shared knowledge from multiple heterogeneous segmentation criteria. Experiments on eight corpora with heterogeneous segmentation criteria show that the performance of each corpus obtains a significant improvement, compared to single-criterion learning. Source codes of this paper are available on Github.

pdf bib
Idiom-Aware Compositional Distributed Semantics
Pengfei Liu | Kaiyu Qian | Xipeng Qiu | Xuanjing Huang
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Idioms are peculiar linguistic constructions that impose great challenges for representing the semantics of language, especially in current prevailing end-to-end neural models, which assume that the semantics of a phrase or sentence can be literally composed from its constitutive words. In this paper, we propose an idiom-aware distributed semantic model to build representation of sentences on the basis of understanding their contained idioms. Our models are grounded in the literal-first psycholinguistic hypothesis, which can adaptively learn semantic compositionality of a phrase literally or idiomatically. To better evaluate our models, we also construct an idiom-enriched sentiment classification dataset with considerable scale and abundant peculiarities of idioms. The qualitative and quantitative experimental analyses demonstrate the efficacy of our models.

pdf bib
Part-of-Speech Tagging for Twitter with Adversarial Neural Networks
Tao Gui | Qi Zhang | Haoran Huang | Minlong Peng | Xuanjing Huang
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this work, we study the problem of part-of-speech tagging for Tweets. In contrast to newswire articles, Tweets are usually informal and contain numerous out-of-vocabulary words. Moreover, there is a lack of large scale labeled datasets for this domain. To tackle these challenges, we propose a novel neural network to make use of out-of-domain labeled data, unlabeled in-domain data, and labeled in-domain data. Inspired by adversarial neural networks, the proposed method tries to learn common features through adversarial discriminator. In addition, we hypothesize that domain-specific features of target domain should be preserved in some degree. Hence, the proposed method adopts a sequence-to-sequence autoencoder to perform this task. Experimental results on three different datasets show that our method achieves better performance than state-of-the-art methods.

2016

pdf bib
Deep Fusion LSTMs for Text Semantic Matching
Pengfei Liu | Xipeng Qiu | Jifan Chen | Xuanjing Huang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Investigating Language Universal and Specific Properties in Word Embeddings
Peng Qian | Xipeng Qiu | Xuanjing Huang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Implicit Discourse Relation Detection via a Deep Architecture with Gated Relevance Network
Jifan Chen | Qi Zhang | Pengfei Liu | Xipeng Qiu | Xuanjing Huang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A New Psychometric-inspired Evaluation Metric for Chinese Word Segmentation
Peng Qian | Xipeng Qiu | Xuanjing Huang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Hashtag Recommendation Using End-To-End Memory Networks with Hierarchical Attention
Haoran Huang | Qi Zhang | Yeyun Gong | Xuanjing Huang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

On microblogging services, people usually use hashtags to mark microblogs, which have a specific theme or content, making them easier for users to find. Hence, how to automatically recommend hashtags for microblogs has received much attention in recent years. Previous deep neural network-based hashtag recommendation approaches converted the task into a multi-class classification problem. However, most of these methods only took the microblog itself into consideration. Motivated by the intuition that the history of users should impact the recommendation procedure, in this work, we extend end-to-end memory networks to perform this task. We incorporate the histories of users into the external memory and introduce a hierarchical attention mechanism to select more appropriate histories. To train and evaluate the proposed method, we also construct a dataset based on microblogs collected from Twitter. Experimental results demonstrate that the proposed methods can significantly outperform state-of-the-art methods. By incorporating the hierarchical attention mechanism, the relative improvement in the proposed method over the state-of-the-art method is around 67.9% in the F1-score.

pdf bib
Attention-Based Convolutional Neural Network for Semantic Relation Extraction
Yatian Shen | Xuanjing Huang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Nowadays, neural networks play an important role in the task of relation classification. In this paper, we propose a novel attention-based convolutional neural network architecture for this task. Our model makes full use of word embedding, part-of-speech tag embedding and position embedding information. Word level attention mechanism is able to better determine which parts of the sentence are most influential with respect to the two entities of interest. This architecture enables learning some important features from task-specific labeled data, forgoing the need for external knowledge such as explicit dependency structures. Experiments on the SemEval-2010 Task 8 benchmark dataset show that our model achieves better performances than several state-of-the-art neural network models and can achieve a competitive performance just with minimal feature engineering.

pdf bib
Deep Multi-Task Learning with Shared Memory for Text Classification
Pengfei Liu | Xipeng Qiu | Xuanjing Huang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Generating Abbreviations for Chinese Named Entities Using Recurrent Neural Network with Dynamic Dictionary
Qi Zhang | Jin Qian | Ya Guo | Yaqian Zhou | Xuanjing Huang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Analyzing Linguistic Knowledge in Sequential Model of Sentence
Peng Qian | Xipeng Qiu | Xuanjing Huang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter
Qi Zhang | Yang Wang | Yeyun Gong | Xuanjing Huang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification
Jiacheng Xu | Danlu Chen | Xipeng Qiu | Xuanjing Huang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Modelling Interaction of Sentence Pair with Coupled-LSTMs
Pengfei Liu | Xipeng Qiu | Yaqian Zhou | Jifan Chen | Xuanjing Huang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
Hashtag Recommendation Using Dirichlet Process Mixture Models Incorporating Types of Hashtags
Yeyun Gong | Qi Zhang | Xuanjing Huang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Sentence Modeling with Gated Recursive Neural Network
Xinchi Chen | Xipeng Qiu | Chenxi Zhu | Shiyu Wu | Xuanjing Huang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Long Short-Term Memory Neural Networks for Chinese Word Segmentation
Xinchi Chen | Xipeng Qiu | Chenxi Zhu | Pengfei Liu | Xuanjing Huang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Transition-based Dependency Parsing Using Two Heterogeneous Gated Recursive Neural Networks
Xinchi Chen | Yaqian Zhou | Chenxi Zhu | Xipeng Qiu | Xuanjing Huang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents
Pengfei Liu | Xipeng Qiu | Xinchi Chen | Shiyu Wu | Xuanjing Huang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Re-ranking Model for Dependency Parser with Recursive Convolutional Neural Network
Chenxi Zhu | Xipeng Qiu | Xinchi Chen | Xuanjing Huang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Gated Recursive Neural Network for Chinese Word Segmentation
Xinchi Chen | Xipeng Qiu | Chenxi Zhu | Xuanjing Huang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Time-aware Personalized Hashtag Recommendation on Social Media
Qi Zhang | Yeyun Gong | Xuyang Sun | Xuanjing Huang
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
A Generative Model for Identifying Target Companies of Microblogs
Yeyun Gong | Yaqian Zhou | Ya Guo | Qi Zhang | Xuanjing Huang
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Automatic Corpus Expansion for Chinese Word Segmentation by Exploiting the Redundancy of Web Information
Xipeng Qiu | ChaoChao Huang | Xuanjing Huang
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Bilingual Product Name Dictionary Construction Using a Two Stage Method
Yatian Shen | Xuanjing Huang
Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing

2013

pdf bib
Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning
Xipeng Qiu | Jiayi Zhao | Xuanjing Huang
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic
Qi Zhang | Jin Qian | Huan Chen | Jihua Kang | Xuanjing Huang
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Detecting Spammers in Community Question Answering
Zhuoye Ding | Yeyun Gong | Yaqian Zhou | Qi Zhang | Xuanjing Huang
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Chinese Named Entity Abbreviation Generation Using First-Order Logic
Huan Chen | Qi Zhang | Jin Qian | Xuanjing Huang
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Understanding the Semantic Intent of Natural Language Query
Juan Xu | Qi Zhang | Xuanjing Huang
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Latent Semantic Tensor Indexing for Community-based Question Answering
Xipeng Qiu | Le Tian | Xuanjing Huang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
FudanNLP: A Toolkit for Chinese Natural Language Processing
Xipeng Qiu | Qi Zhang | Xuanjing Huang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

pdf bib
Automatic Hashtag Recommendation for Microblogs using Topic-Specific Translation Model
Zhuoye Ding | Qi Zhang | Xuanjing Huang
Proceedings of COLING 2012: Posters

pdf bib
Joint Segmentation and Tagging with Coupled Sequences Labeling
Xipeng Qiu | Feng Ji | Jiayi Zhao | Xuanjing Huang
Proceedings of COLING 2012: Posters

pdf bib
Part-of-Speech Tagging for Chinese-English Mixed Texts with Dynamic Features
Jiayi Zhao | Xipeng Qiu | Shu Zhang | Feng Ji | Xuanjing Huang
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Structural Opinion Mining for Graph-based Sentiment Representation
Yuanbin Wu | Qi Zhang | Xuanjing Huang | Lide Wu
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Hierarchical Text Classification with Latent Concepts
Xipeng Qiu | Xuanjing Huang | Zhao Liu | Jinlong Zhou
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Fast Accurate Two-stage Training Algorithm for L1-regularized CRFs with Heuristic Line Search Strategy
Jinlong Zhou | Xipeng Qiu | Xuanjing Huang
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Keyphrase Extraction from Online News Using Binary Integer Programming
Zhuoye Ding | Qi Zhang | Xuanjing Huang
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Efficient Near-Duplicate Detection for Q&A Forum
Yan Wu | Qi Zhang | Xuanjing Huang
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
2D Trie for Fast Parsing
Xian Qian | Qi Zhang | Xuanjing Huang | Lide Wu
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Detecting Hedge Cues and their Scopes with Average Perceptron
Feng Ji | Xipeng Qiu | Xuanjing Huang
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf bib
Adaptive Chinese Word Segmentation with Online Passive-Aggressive Algorithm
Wenjun Gao | Xipeng Qiu | Xuanjing Huang
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Triplet-Based Chinese Word Sense Induction
Zhao Liu | Xipeng Qiu | Xuanjing Huang
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks
Xian Qian | Qi Zhang | Yaqian Zhou | Xuanjing Huang | Lide Wu
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Hierarchical Multi-Label Text Categorization with Global Margin Maximization
Xipeng Qiu | Wenjun Gao | Xuanjing Huang
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Phrase Dependency Parsing for Opinion Mining
Yuanbin Wu | Qi Zhang | Xuanjing Huang | Lide Wu
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2006

pdf bib
Mining the Relation between Sentiment Expression and Target Using Dependency of Words
Zhongchao Fei | Xuanjing Huang | Lide Wu
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

2005

pdf bib
Answering Definition Questions Using Web Knowledge Bases
Zhushuo Zhang | Yaqian Zhou | Xuanjing Huang | Lide Wu
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Question Classification using Multiple Classifiers
Xin Li | Xuan-Jing Huang | Li-de Wu
Proceedings of the Fifth Workshop on Asian Language Resources (ALR-05) and First Symposium on Asian Language Resources Network (ALRN)

pdf bib
The Use of Metadata, Web-derived Answer Patterns and Passage Context to Improve Reading Comprehension Performance
Yongping Du | Helen Meng | Xuanjing Huang | Lide Wu
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

1997

pdf bib
Statistical Acquisition of Terminology Dictionary
Xuan-jing Huang | Li-de Wu | Wen-xin Wang
Fifth Workshop on Very Large Corpora

Search
Co-authors