Jiwei Li


2021

pdf bib
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information
Zijun Sun | Xiaoya Li | Xiaofei Sun | Yuxian Meng | Xiang Ao | Qing He | Fei Wu | Jiwei Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recent pretraining models in Chinese neglect two important aspects specific to the Chinese language: glyph and pinyin, which carry significant syntax and semantic information for language understanding. In this work, we propose ChineseBERT, which incorporates both the glyph and pinyin information of Chinese characters into language model pretraining. The glyph embedding is obtained based on different fonts of a Chinese character, being able to capture character semantics from the visual features, and the pinyin embedding characterizes the pronunciation of Chinese characters, which handles the highly prevalent heteronym phenomenon in Chinese (the same character has different pronunciations with different meanings). Pretrained on large-scale unlabeled Chinese corpus, the proposed ChineseBERT model yields significant performance boost over baseline models with fewer training steps. The proposed model achieves new SOTA performances on a wide range of Chinese NLP tasks, including machine reading comprehension, natural language inference, text classification, sentence pair matching, and competitive performances in named entity recognition and word segmentation.

pdf bib
BertGCN: Transductive Text Classification by Combining GNN and BERT
Yuxiao Lin | Yuxian Meng | Xiaofei Sun | Qinghong Han | Kun Kuang | Jiwei Li | Fei Wu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Making Flexible Use of Subtasks: A Multiplex Interaction Network for Unified Aspect-based Sentiment Analysis
Guoxin Yu | Xiang Ao | Ling Luo | Min Yang | Xiaofei Sun | Jiwei Li | Qing He
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Dice Loss for Data-imbalanced NLP Tasks
Xiaoya Li | Xiaofei Sun | Yuxian Meng | Junjun Liang | Fei Wu | Jiwei Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Many NLP tasks such as tagging and machine reading comprehension are faced with the severe data imbalance issue: negative examples significantly outnumber positive examples, and the huge number of easy-negative examples overwhelms the training. The most commonly used cross entropy (CE) criteria is actually an accuracy-oriented objective, and thus creates a discrepancy between training and test: at training time, each training instance contributes equally to the objective function, while at test time F1 score concerns more about positive examples. In this paper, we propose to use dice loss in replacement of the standard cross-entropy objective for data-imbalanced NLP tasks. Dice loss is based on the Sørensen--Dice coefficient or Tversky index , which attaches similar importance to false positives and false negatives, and is more immune to the data-imbalance issue. To further alleviate the dominating influence from easy-negative examples in training, we propose to associate training examples with dynamically adjusted weights to deemphasize easy-negative examples. Theoretical analysis shows that this strategy narrows down the gap between the F1 score in evaluation and the dice loss in training. With the proposed training objective, we observe significant performance boost on a wide range of data imbalanced NLP tasks. Notably, we are able to achieve SOTA results on CTB5, CTB6 and UD1.4 for the part of speech tagging task; SOTA results on CoNLL03, OntoNotes5.0, MSRA and OntoNotes4.0 for the named entity recognition task; along with competitive results on the tasks of machine reading comprehension and paraphrase identification.

pdf bib
A Unified MRC Framework for Named Entity Recognition
Xiaoya Li | Jingrong Feng | Yuxian Meng | Qinghong Han | Fei Wu | Jiwei Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The task of named entity recognition (NER) is normally divided into nested NER and flat NER depending on whether named entities are nested or not.Models are usually separately developed for the two tasks, since sequence labeling models, the most widely used backbone for flat NER, are only able to assign a single label to a particular token, which is unsuitable for nested NER where a token may be assigned several labels. In this paper, we propose a unified framework that is capable of handling both flat and nested NER tasks. Instead of treating the task of NER as a sequence labeling problem, we propose to formulate it as a machine reading comprehension (MRC) task. For example, extracting entities with the per label is formalized as extracting answer spans to the question “which person is mentioned in the text".This formulation naturally tackles the entity overlapping issue in nested NER: the extraction of two overlapping entities with different categories requires answering two independent questions. Additionally, since the query encodes informative prior knowledge, this strategy facilitates the process of entity extraction, leading to better performances for not only nested NER, but flat NER. We conduct experiments on both nested and flat NER datasets.Experiment results demonstrate the effectiveness of the proposed formulation. We are able to achieve a vast amount of performance boost over current SOTA models on nested NER datasets, i.e., +1.28, +2.55, +5.44, +6.37,respectively on ACE04, ACE05, GENIA and KBP17, along with SOTA results on flat NER datasets, i.e., +0.24, +1.95, +0.21, +1.49 respectively on English CoNLL 2003, English OntoNotes 5.0, Chinese MSRA and Chinese OntoNotes 4.0.

pdf bib
CorefQA: Coreference Resolution as Query-based Span Prediction
Wei Wu | Fei Wang | Arianna Yuan | Fei Wu | Jiwei Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we present CorefQA, an accurate and extensible approach for the coreference resolution task. We formulate the problem as a span prediction task, like in question answering: A query is generated for each candidate mention using its surrounding context, and a span prediction module is employed to extract the text spans of the coreferences within the document using the generated query. This formulation comes with the following key advantages: (1) The span prediction strategy provides the flexibility of retrieving mentions left out at the mention proposal stage; (2) In the question answering framework, encoding the mention and its context explicitly in a query makes it possible to have a deep and thorough examination of cues embedded in the context of coreferent mentions; and (3) A plethora of existing question answering datasets can be used for data augmentation to improve the model’s generalization capability. Experiments demonstrate significant performance boost over previous models, with 83.1 (+3.5) F1 score on the CoNLL-2012 benchmark and 87.5 (+2.5) F1 score on the GAP benchmark.

2019

pdf bib
Deep Adversarial Learning for NLP
William Yang Wang | Sameer Singh | Jiwei Li
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials

Adversarial learning is a game-theoretic learning paradigm, which has achieved huge successes in the field of Computer Vision recently. Adversarial learning is also a general framework that enables a variety of learning models, including the popular Generative Adversarial Networks (GANs). Due to the discrete nature of language, designing adversarial learning models is still challenging for NLP problems. In this tutorial, we provide a gentle introduction to the foundation of deep adversarial learning, as well as some practical problem formulations and solutions in NLP. We describe recent advances in deep adversarial learning for NLP, with a special focus on generation, adversarial examples & rules, and dialogue. We provide an overview of the research area, categorize different types of adversarial learning models, and discuss pros and cons, aiming at providing some practical perspectives on the future of adversarial learning for solving real-world NLP problems.

pdf bib
Entity-Relation Extraction as Multi-Turn Question Answering
Xiaoya Li | Fan Yin | Zijun Sun | Xiayu Li | Arianna Yuan | Duo Chai | Mingxin Zhou | Jiwei Li
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we propose a new paradigm for the task of entity-relation extraction. We cast the task as a multi-turn question answering problem, i.e., the extraction of entities and elations is transformed to the task of identifying answer spans from the context. This multi-turn QA formalization comes with several key advantages: firstly, the question query encodes important information for the entity/relation class we want to identify; secondly, QA provides a natural way of jointly modeling entity and relation; and thirdly, it allows us to exploit the well developed machine reading comprehension (MRC) models. Experiments on the ACE and the CoNLL04 corpora demonstrate that the proposed paradigm significantly outperforms previous best models. We are able to obtain the state-of-the-art results on all of the ACE04, ACE05 and CoNLL04 datasets, increasing the SOTA results on the three datasets to 49.6 (+1.2), 60.3 (+0.7) and 69.2 (+1.4), respectively. Additionally, we construct and will release a newly developed dataset RESUME, which requires multi-step reasoning to construct entity dependencies, as opposed to the single-step dependency extraction in the triplet exaction in previous datasets. The proposed multi-turn QA model also achieves the best performance on the RESUME dataset.

pdf bib
Is Word Segmentation Necessary for Deep Learning of Chinese Representations?
Xiaoya Li | Yuxian Meng | Xiaofei Sun | Qinghong Han | Arianna Yuan | Jiwei Li
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Segmenting a chunk of text into words is usually the first step of processing Chinese text, but its necessity has rarely been explored. In this paper, we ask the fundamental question of whether Chinese word segmentation (CWS) is necessary for deep learning-based Chinese Natural Language Processing. We benchmark neural word-based models which rely on word segmentation against neural char-based models which do not involve word segmentation in four end-to-end NLP benchmark tasks: language modeling, machine translation, sentence matching/paraphrase and text classification. Through direct comparisons between these two types of models, we find that char-based models consistently outperform word-based models. Based on these observations, we conduct comprehensive experiments to study why word-based models underperform char-based models in these deep learning-based NLP tasks. We show that it is because word-based models are more vulnerable to data sparsity and the presence of out-of-vocabulary (OOV) words, and thus more prone to overfitting. We hope this paper could encourage researchers in the community to rethink the necessity of word segmentation in deep learning-based Chinese Natural Language Processing.

2018

pdf bib
Deep Reinforcement Learning for NLP
William Yang Wang | Jiwei Li | Xiaodong He
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

Many Natural Language Processing (NLP) tasks (including generation, language grounding, reasoning, information extraction, coreference resolution, and dialog) can be formulated as deep reinforcement learning (DRL) problems. However, since language is often discrete and the space for all sentences is infinite, there are many challenges for formulating reinforcement learning problems of NLP tasks. In this tutorial, we provide a gentle introduction to the foundation of deep reinforcement learning, as well as some practical DRL solutions in NLP. We describe recent advances in designing deep reinforcement learning for NLP, with a special focus on generation, dialogue, and information extraction. Finally, we discuss why they succeed, and when they may fail, aiming at providing some practical advice about deep reinforcement learning for solving real-world NLP problems.

pdf bib
Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints
Ashutosh Baheti | Alan Ritter | Jiwei Li | Bill Dolan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Neural conversation models tend to generate safe, generic responses for most inputs. This is due to the limitations of likelihood-based decoding objectives in generation tasks with diverse outputs, such as conversation. To address this challenge, we propose a simple yet effective approach for incorporating side information in the form of distributional constraints over the generated responses. We propose two constraints that help generate more content rich responses that are based on a model of syntax and topics (Griffiths et al., 2005) and semantic similarity (Arora et al., 2016). We evaluate our approach against a variety of competitive baselines, using both automatic metrics and human judgments, showing that our proposed approach generates responses that are much less generic without sacrificing plausibility. A working demo of our code can be found at https://github.com/abaheti95/DC-NeuralConversation.

2017

pdf bib
Neural Net Models of Open-domain Discourse Coherence
Jiwei Li | Dan Jurafsky
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Discourse coherence is strongly associated with text quality, making it important to natural language generation and understanding. Yet existing models of coherence focus on measuring individual aspects of coherence (lexical overlap, rhetorical structure, entity centering) in narrow domains. In this paper, we describe domain-independent neural models of discourse coherence that are capable of measuring multiple aspects of coherence in existing sentences and can maintain coherence while generating new sentences. We study both discriminative models that learn to distinguish coherent from incoherent discourse, and generative models that produce coherent text, including a novel neural latent-variable Markovian generative model that captures the latent discourse dependencies between sentences in a text. Our work achieves state-of-the-art performance on multiple coherence evaluations, and marks an initial step in generating coherent texts given discourse contexts.

pdf bib
Adversarial Learning for Neural Dialogue Generation
Jiwei Li | Will Monroe | Tianlin Shi | Sébastien Jean | Alan Ritter | Dan Jurafsky
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We apply adversarial training to open-domain dialogue generation, training a system to produce sequences that are indistinguishable from human-generated dialogue utterances. We cast the task as a reinforcement learning problem where we jointly train two systems: a generative model to produce response sequences, and a discriminator—analagous to the human evaluator in the Turing test— to distinguish between the human-generated dialogues and the machine-generated ones. In this generative adversarial network approach, the outputs from the discriminator are used to encourage the system towards more human-like dialogue. Further, we investigate models for adversarial evaluation that uses success in fooling an adversary as a dialogue evaluation metric, while avoiding a number of potential pitfalls. Experimental results on several metrics, including adversarial evaluation, demonstrate that the adversarially-trained system generates higher-quality responses than previous baselines

2016

pdf bib
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li | Will Monroe | Alan Ritter | Dan Jurafsky | Michel Galley | Jianfeng Gao
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Persona-Based Neural Conversation Model
Jiwei Li | Michel Galley | Chris Brockett | Georgios Spithourakis | Jianfeng Gao | Bill Dolan
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Diversity-Promoting Objective Function for Neural Conversation Models
Jiwei Li | Michel Galley | Chris Brockett | Jianfeng Gao | Bill Dolan
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Visualizing and Understanding Neural Models in NLP
Jiwei Li | Xinlei Chen | Eduard Hovy | Dan Jurafsky
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Do Multi-Sense Embeddings Improve Natural Language Understanding?
Jiwei Li | Dan Jurafsky
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
When Are Tree Structures Necessary for Deep Learning of Representations?
Jiwei Li | Thang Luong | Dan Jurafsky | Eduard Hovy
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Hierarchical Neural Autoencoder for Paragraphs and Documents
Jiwei Li | Thang Luong | Dan Jurafsky
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Sentiment Analysis on the People’s Daily
Jiwei Li | Eduard Hovy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Major Life Event Extraction from Twitter based on Congratulations/Condolences Speech Acts
Jiwei Li | Alan Ritter | Claire Cardie | Eduard Hovy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
A Model of Coherence Based on Distributed Sentence Representation
Jiwei Li | Eduard Hovy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Recursive Deep Models for Discourse Parsing
Jiwei Li | Rumeng Li | Eduard Hovy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Weakly Supervised User Profile Extraction from Twitter
Jiwei Li | Alan Ritter | Eduard Hovy
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Towards a General Rule for Identifying Deceptive Opinion Spam
Jiwei Li | Myle Ott | Claire Cardie | Eduard Hovy
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Identifying Manipulated Offerings on Review Portals
Jiwei Li | Myle Ott | Claire Cardie
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Novel Feature-based Bayesian Model for Query Focused Multi-document Summarization
Jiwei Li | Sujian Li
Transactions of the Association for Computational Linguistics, Volume 1

Supervised learning methods and LDA based topic model have been successfully applied in the field of multi-document summarization. In this paper, we propose a novel supervised approach that can incorporate rich sentence features into Bayesian topic models in a principled way, thus taking advantages of both topic model and feature based supervised learning methods. Experimental results on DUC2007, TAC2008 and TAC2009 demonstrate the effectiveness of our approach.

pdf bib
TopicSpam: a Topic-Model based approach for spam detection
Jiwei Li | Claire Cardie | Sujian Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Evolutionary Hierarchical Dirichlet Process for Timeline Summarization
Jiwei Li | Sujian Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Update Summarization using a Multi-level Hierarchical Dirichlet Process Model
Jiwei Li | Sujian Li | Xun Wang | Ye Tian | Baobao Chang
Proceedings of COLING 2012

pdf bib
Implicit Discourse Relation Recognition by Selecting Typical Training Examples
Xun Wang | Sujian Li | Jiwei Li | Wenjie Li
Proceedings of COLING 2012

pdf bib
Fine-Grained Classification of Named Entities by Fusing Multi-Features
Wenjie Li | Jiwei Li | Ye Tian | Zhifang Sui
Proceedings of COLING 2012: Posters