Zhifang Sui


2022

pdf bib
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues
Qingxiu Dong | Ziwei Qin | Heming Xia | Tian Feng | Shoujie Tong | Haoran Meng | Lin Xu | Zhongyu Wei | Weidong Zhan | Baobao Chang | Sujian Li | Tianyu Liu | Zhifang Sui
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

It is a common practice for recent works in vision language cross-modal reasoning to adopt a binary or multi-choice classification formulation taking as input a set of source image(s) and textual query. In this work, we take a sober look at such an “unconditional” formulation in the sense that no prior knowledge is specified with respect to the source image(s). Inspired by the designs of both visual commonsense reasoning and natural language inference tasks, we propose a new task termed “Premise-based Multi-modal Reasoning” (PMR) where a textual premise is the background presumption on each source image.The PMR dataset contains 15,360 manually annotated samples which are created by a multi-phase crowd-sourcing process. With selected high-quality movie screenshots and human-curated premise templates from 6 pre-defined categories, we ask crowd-source workers to write one true hypothesis and three distractors (4 choices) given the premise and image through a cross-check procedure.

pdf bib
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu | Yizhe Zhang | Chris Brockett | Yi Mao | Zhifang Sui | Weizhu Chen | Bill Dolan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large pretrained generative models like GPT-3 often suffer from hallucinating non-existent or incorrect content, which undermines their potential merits in real applications. Existing work usually attempts to detect these hallucinations based on a corresponding oracle reference at a sentence or document level. However ground-truth references may not be readily available for many free-form text generation applications, and sentence- or document-level detection may fail to provide the fine-grained signals that would prevent fallacious content in real time. As a first step to addressing these issues, we propose a novel token-level, reference-free hallucination detection task and an associated annotated dataset named HaDeS (HAllucination DEtection dataSet). To create this dataset, we first perturb a large number of text segments extracted from English language Wikipedia, and then verify these with crowd-sourced annotations. To mitigate label imbalance during annotation, we utilize an iterative model-in-loop strategy. We conduct comprehensive data analyses and create multiple baseline models.

pdf bib
StableMoE: Stable Routing Strategy for Mixture of Experts
Damai Dai | Li Dong | Shuming Ma | Bo Zheng | Zhifang Sui | Baobao Chang | Furu Wei
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The Mixture-of-Experts (MoE) technique can scale up the model size of Transformers with an affordable computational overhead. We point out that existing learning-to-route MoE methods suffer from the routing fluctuation issue, i.e., the target expert of the same input may change along with training, but only one expert will be activated for the input during inference. The routing fluctuation tends to harm sample efficiency because the same input updates different experts but only one is finally used. In this paper, we propose StableMoE with two training stages to address the routing fluctuation problem. In the first training stage, we learn a balanced and cohesive routing strategy and distill it into a lightweight router decoupled from the backbone model. In the second training stage, we utilize the distilled router to determine the token-to-expert assignment and freeze it for a stable routing strategy. We validate our method on language modeling and multilingual machine translation. The results show that StableMoE outperforms existing MoE methods in terms of both convergence speed and performance.

pdf bib
CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Ningyu Zhang | Mosha Chen | Zhen Bi | Xiaozhuan Liang | Lei Li | Xin Shang | Kangping Yin | Chuanqi Tan | Jian Xu | Fei Huang | Luo Si | Yuan Ni | Guotong Xie | Zhifang Sui | Baobao Chang | Hui Zong | Zheng Yuan | Linfeng Li | Jun Yan | Hongying Zan | Kunli Zhang | Buzhou Tang | Qingcai Chen
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Artificial Intelligence (AI), along with the recent progress in biomedical language understanding, is gradually offering great promise for medical practice. With the development of biomedical language understanding benchmarks, AI applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replicate many of the successes in English for other languages. To facilitate research in this direction, we collect real-world biomedical data and present the first Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification, and an associated online platform for model evaluation, comparison, and analysis. To establish evaluation on these tasks, we report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.

pdf bib
Knowledge Neurons in Pretrained Transformers
Damai Dai | Li Dong | Yaru Hao | Zhifang Sui | Baobao Chang | Furu Wei
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large-scale pretrained language models are surprisingly good at recalling factual knowledge presented in the training corpus. In this paper, we present preliminary studies on how factual knowledge is stored in pretrained Transformers by introducing the concept of knowledge neurons. Specifically, we examine the fill-in-the-blank cloze task for BERT. Given a relational fact, we propose a knowledge attribution method to identify the neurons that express the fact. We find that the activation of such knowledge neurons is positively correlated to the expression of their corresponding facts. In our case studies, we attempt to leverage knowledge neurons to edit (such as update, and erase) specific factual knowledge without fine-tuning. Our results shed light on understanding the storage of knowledge within pretrained Transformers.

pdf bib
Hierarchical Curriculum Learning for AMR Parsing
Peiyi Wang | Liang Chen | Tianyu Liu | Damai Dai | Yunbo Cao | Baobao Chang | Zhifang Sui
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Abstract Meaning Representation (AMR) parsing aims to translate sentences to semantic representation with a hierarchical structure, and is recently empowered by pretrained sequence-to-sequence models. However, there exists a gap between their flat training objective (i.e., equally treats all output tokens) and the hierarchical AMR structure, which limits the model generalization. To bridge this gap, we propose a Hierarchical Curriculum Learning (HCL) framework with Structure-level (SC) and Instance-level Curricula (IC). SC switches progressively from core to detail AMR semantic elements while IC transits from structure-simple to -complex AMR instances during training. Through these two warming-up processes, HCL reduces the difficulty of learning complex structures, thus the flat model can better adapt to the AMR hierarchy. Extensive experiments on AMR2.0, AMR3.0, structure-complex and out-of-distribution situations verify the effectiveness of HCL.

2021

pdf bib
Inductively Representing Out-of-Knowledge-Graph Entities by Optimal Estimation Under Translational Assumptions
Damai Dai | Hua Zheng | Fuli Luo | Pengcheng Yang | Tianyu Liu | Zhifang Sui | Baobao Chang
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

Conventional Knowledge Graph Completion (KGC) assumes that all test entities appear during training. However, in real-world scenarios, Knowledge Graphs (KG) evolve fast with out-of-knowledge-graph (OOKG) entities added frequently, and we need to efficiently represent these entities. Most existing Knowledge Graph Embedding (KGE) methods cannot represent OOKG entities without costly retraining on the whole KG. To enhance efficiency, we propose a simple and effective method that inductively represents OOKG entities by their optimal estimation under translational assumptions. Moreover, given pretrained embeddings of the in-knowledge-graph (IKG) entities, our method even needs no additional learning. Experimental results on two KGC tasks with OOKG entities show that our method outperforms the previous methods by a large margin with higher efficiency.

pdf bib
Decompose, Fuse and Generate: A Formation-Informed Method for Chinese Definition Generation
Hua Zheng | Damai Dai | Lei Li | Tianyu Liu | Zhifang Sui | Baobao Chang | Yang Liu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In this paper, we tackle the task of Definition Generation (DG) in Chinese, which aims at automatically generating a definition for a word. Most existing methods take the source word as an indecomposable semantic unit. However, in parataxis languages like Chinese, word meanings can be composed using the word formation process, where a word (“桃花”, peach-blossom) is formed by formation components (“桃”, peach; “花”, flower) using a formation rule (Modifier-Head). Inspired by this process, we propose to enhance DG with word formation features. We build a formation-informed dataset, and propose a model DeFT, which Decomposes words into formation features, dynamically Fuses different features through a gating mechanism, and generaTes word definitions. Experimental results show that our method is both effective and robust.

2020

pdf bib
An Anchor-Based Automatic Evaluation Metric for Document Summarization
Kexiang Wang | Tianyu Liu | Baobao Chang | Zhifang Sui
Proceedings of the 28th International Conference on Computational Linguistics

The widespread adoption of reference-based automatic evaluation metrics such as ROUGE has promoted the development of document summarization. In this paper, we consider a new protocol for designing reference-based metrics that require the endorsement of source document(s). Following protocol, we propose an anchored ROUGE metric fixing each summary particle on source document, which bases the computation on more solid ground. Empirical results on benchmark datasets validate that source document helps to induce a higher correlation with human judgments for ROUGE metric. Being self-explanatory and easy-to-implement, the protocol can naturally foster various effective designs of reference-based metrics besides the anchored ROUGE introduced here.

pdf bib
A Spectral Method for Unsupervised Multi-Document Summarization
Kexiang Wang | Baobao Chang | Zhifang Sui
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Multi-document summarization (MDS) aims at producing a good-quality summary for several related documents. In this paper, we propose a spectral-based hypothesis, which states that the goodness of summary candidate is closely linked to its so-called spectral impact. Here spectral impact considers the perturbation to the dominant eigenvalue of affinity matrix when dropping the summary candidate from the document cluster. The hypothesis is validated by three theoretical perspectives: semantic scaling, propagation dynamics and matrix perturbation. According to the hypothesis, we formulate the MDS task as the combinatorial optimization of spectral impact and propose an accelerated greedy solution based on a surrogate of spectral impact. The evaluation results on various datasets demonstrate: (1) The performance of the summary candidate is positively correlated with its spectral impact, which accords with our hypothesis; (2) Our spectral-based method has a competitive result as compared to state-of-the-art MDS systems.

pdf bib
Discriminatively-Tuned Generative Classifiers for Robust Natural Language Inference
Xiaoan Ding | Tianyu Liu | Baobao Chang | Zhifang Sui | Kevin Gimpel
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

While discriminative neural network classifiers are generally preferred, recent work has shown advantages of generative classifiers in term of data efficiency and robustness. In this paper, we focus on natural language inference (NLI). We propose GenNLI, a generative classifier for NLI tasks, and empirically characterize its performance by comparing it to five baselines, including discriminative models and large-scale pretrained language representation models like BERT. We explore training objectives for discriminative fine-tuning of our generative classifiers, showing improvements over log loss fine-tuning from prior work (Lewis and Fan, 2019). In particular, we find strong results with a simple unbounded modification to log loss, which we call the “infinilog loss”. Our experiments show that GenNLI outperforms both discriminative and pretrained baselines across several challenging NLI experimental settings, including small training sets, imbalanced label distributions, and label noise.

pdf bib
An Empirical Study on Model-agnostic Debiasing Strategies for Robust Natural Language Inference
Tianyu Liu | Zheng Xin | Xiaoan Ding | Baobao Chang | Zhifang Sui
Proceedings of the 24th Conference on Computational Natural Language Learning

The prior work on natural language inference (NLI) debiasing mainly targets at one or few known biases while not necessarily making the models more robust. In this paper, we focus on the model-agnostic debiasing strategies and explore how to (or is it possible to) make the NLI models robust to multiple distinct adversarial attacks while keeping or even strengthening the models’ generalization power. We firstly benchmark prevailing neural NLI models including pretrained ones on various adversarial datasets. We then try to combat distinct known biases by modifying a mixture of experts (MoE) ensemble method and show that it’s nontrivial to mitigate multiple NLI biases at the same time, and that model-level ensemble method outperforms MoE ensemble method. We also perform data augmentation including text swap, word substitution and paraphrase and prove its efficiency in combating various (though not all) adversarial attacks at the same time. Finally, we investigate several methods to merge heterogeneous training data (1.35M) and perform model ensembling, which are straightforward but effective to strengthen NLI models.

pdf bib
面向医学文本处理的医学实体标注规范(Medical Entity Annotation Standard for Medical Text Processing)
Huan Zhang (张欢) | Yuan Zong (宗源) | Baobao Chang (常宝宝) | Zhifang Sui (穗志方) | Hongying Zan (昝红英) | Kunli Zhang (张坤丽)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

随着智慧医疗的普及,利用自然语言处理技术识别医学信息的需求日益增长。目前,针对医学实体而言,医学共享语料库仍处于空白状态,这对医学文本信息处理各项任务的进展造成了巨大阻力。如何判断不同的医学实体类别?如何界定不同实体间的涵盖范围?这些问题导致缺乏类似通用场景的大规模规范标注的医学文本数据。针对上述问题,该文参考了UMLS中定义的语义类型,提出面向医学文本信息处理的医学实体标注规范,涵盖了疾病、临床表现、医疗程序、医疗设备等9种医学实体,以及基于规范构建医学实体标注语料库。该文综述了标注规范的描述体系、分类原则、混淆处理、语料标注过程以及医学实体自动标注基线实验等相关问题,希望能为医学实体语料库的构建提供可参考的标注规范,以及为医学实体识别提供语料支持。

pdf bib
HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference
Tianyu Liu | Zheng Xin | Baobao Chang | Zhifang Sui
Proceedings of the 12th Language Resources and Evaluation Conference

Many recent studies have shown that for models trained on datasets for natural language inference (NLI), it is possible to make correct predictions by merely looking at the hypothesis while completely ignoring the premise. In this work, we manage to derive adversarial examples in terms of the hypothesis-only bias and explore eligible ways to mitigate such bias. Specifically, we extract various phrases from the hypotheses (artificial patterns) in the training sets, and show that they have been strong indicators to the specific labels. We then figure out ‘hard’ and ‘easy’ instances from the original test sets whose labels are opposite to or consistent with those indications. We also set up baselines including both pretrained models (BERT, RoBerta, XLNet) and competitive non-pretrained models (InferSent, DAM, ESIM). Apart from the benchmark and baselines, we also investigate two debiasing approaches which exploit the artificial pattern modeling to mitigate such hypothesis-only bias: down-sampling and adversarial training. We believe those methods can be treated as competitive baselines in NLI debiasing tasks.

2019

pdf bib
Pun-GAN: Generative Adversarial Network for Pun Generation
Fuli Luo | Shunyao Li | Pengcheng Yang | Lei Li | Baobao Chang | Zhifang Sui | Xu Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this paper, we focus on the task of generating a pun sentence given a pair of word senses. A major challenge for pun generation is the lack of large-scale pun corpus to guide supervised learning. To remedy this, we propose an adversarial generative network for pun generation (Pun-GAN). It consists of a generator to produce pun sentences, and a discriminator to distinguish between the generated pun sentences and the real sentences with specific word senses. The output of the discriminator is then used as a reward to train the generator via reinforcement learning, encouraging it to produce pun sentences which can support two word senses simultaneously. Experiments show that the proposed Pun-GAN can generate sentences that are more ambiguous and diverse in both automatic and human evaluation.

pdf bib
Towards Fine-grained Text Sentiment Transfer
Fuli Luo | Peng Li | Pengcheng Yang | Jie Zhou | Yutong Tan | Baobao Chang | Zhifang Sui | Xu Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we focus on the task of fine-grained text sentiment transfer (FGST). This task aims to revise an input sequence to satisfy a given sentiment intensity, while preserving the original semantic content. Different from the conventional sentiment transfer task that only reverses the sentiment polarity (positive/negative) of text, the FTST task requires more nuanced and fine-grained control of sentiment. To remedy this, we propose a novel Seq2SentiSeq model. Specifically, the numeric sentiment intensity value is incorporated into the decoder via a Gaussian kernel layer to finely control the sentiment intensity of the output. Moreover, to tackle the problem of lacking parallel data, we propose a cycle reinforcement learning algorithm to guide the model training. In this framework, the elaborately designed rewards can balance both sentiment transformation and content preservation, while not requiring any ground truth output. Experimental results show that our approach can outperform existing methods by a large margin in both automatic evaluation and human evaluation.

pdf bib
Towards Comprehensive Description Generation from Factual Attribute-value Tables
Tianyu Liu | Fuli Luo | Pengcheng Yang | Wei Wu | Baobao Chang | Zhifang Sui
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The comprehensive descriptions for factual attribute-value tables, which should be accurate, informative and loyal, can be very helpful for end users to understand the structured data in this form. However previous neural generators might suffer from key attributes missing, less informative and groundless information problems, which impede the generation of high-quality comprehensive descriptions for tables. To relieve these problems, we first propose force attention (FA) method to encourage the generator to pay more attention to the uncovered attributes to avoid potential key attributes missing. Furthermore, we propose reinforcement learning for information richness to generate more informative as well as more loyal descriptions for tables. In our experiments, we utilize the widely used WIKIBIO dataset as a benchmark. Besides, we create WB-filter based on WIKIBIO to test our model in the simulated user-oriented scenarios, in which the generated descriptions should accord with particular user interests. Experimental results show that our model outperforms the state-of-the-art baselines on both automatic and human evaluation.

pdf bib
Learning to Control the Fine-grained Sentiment for Story Ending Generation
Fuli Luo | Damai Dai | Pengcheng Yang | Tianyu Liu | Baobao Chang | Zhifang Sui | Xu Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatic story ending generation is an interesting and challenging task in natural language generation. Previous studies are mainly limited to generate coherent, reasonable and diversified story endings, and few works focus on controlling the sentiment of story endings. This paper focuses on generating a story ending which meets the given fine-grained sentiment intensity. There are two major challenges to this task. First is the lack of story corpus which has fine-grained sentiment labels. Second is the difficulty of explicitly controlling sentiment intensity when generating endings. Therefore, we propose a generic and novel framework which consists of a sentiment analyzer and a sentimental generator, respectively addressing the two challenges. The sentiment analyzer adopts a series of methods to acquire sentiment intensities of the story dataset. The sentimental generator introduces the sentiment intensity into decoder via a Gaussian Kernel Layer to control the sentiment of the output. To the best of our knowledge, this is the first endeavor to control the fine-grained sentiment for story ending generation without manually annotating sentiment labels. Experiments show that our proposed framework can generate story endings which are not only more coherent and fluent but also able to meet the given sentiment intensity better.

2018

pdf bib
Incorporating Glosses into Neural Word Sense Disambiguation
Fuli Luo | Tianyu Liu | Qiaolin Xia | Baobao Chang | Zhifang Sui
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Word Sense Disambiguation (WSD) aims to identify the correct meaning of polysemous words in the particular context. Lexical resources like WordNet which are proved to be of great help for WSD in the knowledge-based methods. However, previous neural networks for WSD always rely on massive labeled data (context), ignoring lexical resources like glosses (sense definitions). In this paper, we integrate the context and glosses of the target word into a unified framework in order to make full use of both labeled data and lexical knowledge. Therefore, we propose GAS: a gloss-augmented WSD neural network which jointly encodes the context and glosses of the target word. GAS models the semantic relationship between the context and the gloss in an improved memory network framework, which breaks the barriers of the previous supervised methods and knowledge-based methods. We further extend the original gloss of word sense via its semantic relations in WordNet to enrich the gloss information. The experimental results show that our model outperforms the state-of-the-art systems on several English all-words WSD datasets.

pdf bib
EventWiki: A Knowledge Base of Major Events
Tao Ge | Lei Cui | Baobao Chang | Zhifang Sui | Furu Wei | Ming Zhou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Revisiting Distant Supervision for Relation Extraction
Tingsong Jiang | Jing Liu | Chin-Yew Lin | Zhifang Sui
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention
Fuli Luo | Tianyu Liu | Zexue He | Qiaolin Xia | Zhifang Sui | Baobao Chang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

The goal of Word Sense Disambiguation (WSD) is to identify the correct meaning of a word in the particular context. Traditional supervised methods only use labeled data (context), while missing rich lexical knowledge such as the gloss which defines the meaning of a word sense. Recent studies have shown that incorporating glosses into neural networks for WSD has made significant improvement. However, the previous models usually build the context representation and gloss representation separately. In this paper, we find that the learning for the context and gloss representation can benefit from each other. Gloss can help to highlight the important words in the context, thus building a better context representation. Context can also help to locate the key words in the gloss of the correct word sense. Therefore, we introduce a co-attention mechanism to generate co-dependent representations for the context and gloss. Furthermore, in order to capture both word-level and sentence-level information, we extend the attention mechanism in a hierarchical fashion. Experimental results show that our model achieves the state-of-the-art results on several standard English all-words WSD test datasets.

pdf bib
Fine-grained Coordinated Cross-lingual Text Stream Alignment for Endless Language Knowledge Acquisition
Tao Ge | Qing Dou | Heng Ji | Lei Cui | Baobao Chang | Zhifang Sui | Furu Wei | Ming Zhou
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper proposes to study fine-grained coordinated cross-lingual text stream alignment through a novel information network decipherment paradigm. We use Burst Information Networks as media to represent text streams and present a simple yet effective network decipherment algorithm with diverse clues to decipher the networks for accurate text stream alignment. Experiments on Chinese-English news streams show our approach not only outperforms previous approaches on bilingual lexicon extraction from coordinated text streams but also can harvest high-quality alignments from large amounts of streaming data for endless language knowledge mining, which makes it promising to be a new paradigm for automatic language knowledge acquisition.

2017

pdf bib
Affinity-Preserving Random Walk for Multi-Document Summarization
Kexiang Wang | Tianyu Liu | Zhifang Sui | Baobao Chang
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Multi-document summarization provides users with a short text that summarizes the information in a set of related documents. This paper introduces affinity-preserving random walk to the summarization task, which preserves the affinity relations of sentences by an absorbing random walk model. Meanwhile, we put forward adjustable affinity-preserving random walk to enforce the diversity constraint of summarization in the random walk process. The ROUGE evaluations on DUC 2003 topic-focused summarization task and DUC 2004 generic summarization task show the good performance of our method, which has the best ROUGE-2 recall among the graph-based ranking methods.

pdf bib
A Soft-label Method for Noise-tolerant Distantly Supervised Relation Extraction
Tianyu Liu | Kexiang Wang | Baobao Chang | Zhifang Sui
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Distant-supervised relation extraction inevitably suffers from wrong labeling problems because it heuristically labels relational facts with knowledge bases. Previous sentence level denoise models don’t achieve satisfying performances because they use hard labels which are determined by distant supervision and immutable during training. To this end, we introduce an entity-pair level denoise method which exploits semantic information from correctly labeled entity pairs to correct wrong labels dynamically during training. We propose a joint score function which combines the relational scores based on the entity-pair representation and the confidence of the hard label to obtain a new label, namely a soft label, for certain entity pair. During training, soft labels instead of hard labels serve as gold labels. Experiments on the benchmark dataset show that our method dramatically reduces noisy instances and outperforms other state-of-the-art systems.

pdf bib
Proceedings of the 9th SIGHAN Workshop on Chinese Language Processing
Yue Zhang | Zhifang Sui
Proceedings of the 9th SIGHAN Workshop on Chinese Language Processing

pdf bib
A Progressive Learning Approach to Chinese SRL Using Heterogeneous Data
Qiaolin Xia | Lei Sha | Baobao Chang | Zhifang Sui
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Previous studies on Chinese semantic role labeling (SRL) have concentrated on a single semantically annotated corpus. But the training data of single corpus is often limited. Whereas the other existing semantically annotated corpora for Chinese SRL are scattered across different annotation frameworks. But still, Data sparsity remains a bottleneck. This situation calls for larger training datasets, or effective approaches which can take advantage of highly heterogeneous data. In this paper, we focus mainly on the latter, that is, to improve Chinese SRL by using heterogeneous corpora together. We propose a novel progressive learning model which augments the Progressive Neural Network with Gated Recurrent Adapters. The model can accommodate heterogeneous inputs and effectively transfer knowledge between them. We also release a new corpus, Chinese SemBank, for Chinese SRL. Experiments on CPB 1.0 show that our model outperforms state-of-the-art methods.

2016

pdf bib
News Stream Summarization using Burst Information Networks
Tao Ge | Lei Cui | Baobao Chang | Sujian Li | Ming Zhou | Zhifang Sui
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Capturing Argument Relationship for Chinese Semantic Role Labeling
Lei Sha | Sujian Li | Baobao Chang | Zhifang Sui | Tingsong Jiang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Encoding Temporal Information for Time-Aware Link Prediction
Tingsong Jiang | Tianyu Liu | Tao Ge | Lei Sha | Sujian Li | Baobao Chang | Zhifang Sui
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
RBPB: Regularization-Based Pattern Balancing Method for Event Extraction
Lei Sha | Jing Liu | Chin-Yew Lin | Sujian Li | Baobao Chang | Zhifang Sui
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Joint Learning Templates and Slots for Event Schema Induction
Lei Sha | Sujian Li | Baobao Chang | Zhifang Sui
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Towards Time-Aware Knowledge Graph Completion
Tingsong Jiang | Tianyu Liu | Tao Ge | Lei Sha | Baobao Chang | Sujian Li | Zhifang Sui
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Knowledge graph (KG) completion adds new facts to a KG by making inferences from existing facts. Most existing methods ignore the time information and only learn from time-unknown fact triples. In dynamic environments that evolve over time, it is important and challenging for knowledge graph completion models to take into account the temporal aspects of facts. In this paper, we present a novel time-aware knowledge graph completion model that is able to predict links in a KG using both the existing facts and the temporal information of the facts. To incorporate the happening time of facts, we propose a time-aware KG embedding model using temporal order information among facts. To incorporate the valid time of facts, we propose a joint time-aware inference model based on Integer Linear Programming (ILP) using temporal consistencyinformationasconstraints. Wefurtherintegratetwomodelstomakefulluseofglobal temporal information. We empirically evaluate our models on time-aware KG completion task. Experimental results show that our time-aware models achieve the state-of-the-art on temporal facts consistently.

pdf bib
Reading and Thinking: Re-read LSTM Unit for Textual Entailment Recognition
Lei Sha | Baobao Chang | Zhifang Sui | Sujian Li
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Recognizing Textual Entailment (RTE) is a fundamentally important task in natural language processing that has many applications. The recently released Stanford Natural Language Inference (SNLI) corpus has made it possible to develop and evaluate deep neural network methods for the RTE task. Previous neural network based methods usually try to encode the two sentences (premise and hypothesis) and send them together into a multi-layer perceptron to get their entailment type, or use LSTM-RNN to link two sentences together while using attention mechanic to enhance the model’s ability. In this paper, we propose to use the re-read mechanic, which means to read the premise again and again while reading the hypothesis. After read the premise again, the model can get a better understanding of the premise, which can also affect the understanding of the hypothesis. On the contrary, a better understanding of the hypothesis can also affect the understanding of the premise. With the alternative re-read process, the model can “think” of a better decision of entailment type. We designed a new LSTM unit called re-read LSTM (rLSTM) to implement this “thinking” process. Experiments show that we achieve results better than current state-of-the-art equivalents.

pdf bib
Event Detection with Burst Information Networks
Tao Ge | Lei Cui | Baobao Chang | Zhifang Sui | Ming Zhou
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Retrospective event detection is an important task for discovering previously unidentified events in a text stream. In this paper, we propose two fast centroid-aware event detection models based on a novel text stream representation – Burst Information Networks (BINets) for addressing the challenge. The BINets are time-aware, efficient and can be easily analyzed for identifying key information (centroids). These advantages allow the BINet-based approaches to achieve the state-of-the-art performance on multiple datasets, demonstrating the efficacy of BINets for the task of event detection.

2015

pdf bib
Recognizing Textual Entailment Using Probabilistic Inference
Lei Sha | Sujian Li | Baobao Chang | Zhifang Sui | Tingsong Jiang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Chinese Semantic Role Labeling with Bidirectional Recurrent Neural Networks
Zhen Wang | Tingsong Jiang | Baobao Chang | Zhifang Sui
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
ERSOM: A Structural Ontology Matching Approach Using Automatically Learned Entity Representation
Chuncheng Xiang | Tingsong Jiang | Baobao Chang | Zhifang Sui
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Bring you to the past: Automatic Generation of Topically Relevant Event Chronicles
Tao Ge | Wenzhe Pei | Heng Ji | Sujian Li | Baobao Chang | Zhifang Sui
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
One Tense per Scene: Predicting Tense in Chinese Conversations
Tao Ge | Heng Ji | Baobao Chang | Zhifang Sui
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing
Liang-Chih Yu | Zhifang Sui | Yue Zhang | Vincent Ng
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing

2014

pdf bib
The Construction of language Resource and Knowledge Base for Chinese Language Computing
Zhifang Sui
Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
The CIPS-SIGHAN CLP 2014 Chinese Word Segmentation Bake-off
Huiming Duan | Zhifang Sui | Tao Ge
Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing

2013

pdf bib
Event-Based Time Label Propagation for Automatic Dating of News Articles
Tao Ge | Baobao Chang | Sujian Li | Zhifang Sui
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Towards Accurate Distant Supervision for Relational Facts Extraction
Xingxing Zhang | Jianwen Zhang | Junyu Zeng | Jun Yan | Zheng Chen | Zhifang Sui
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Fine-Grained Classification of Named Entities by Fusing Multi-Features
Wenjie Li | Jiwei Li | Ye Tian | Zhifang Sui
Proceedings of COLING 2012: Posters

pdf bib
The CIPS-SIGHAN CLP 2012 ChineseWord Segmentation onMicroBlog Corpora Bakeoff
Huiming Duan | Zhifang Sui | Ye Tian | Wenjie Li
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

2009

pdf bib
Prediction of Thematic Rank for Structured Semantic Role Labeling
Weiwei Sun | Zhifang Sui | Meng Wang
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Chinese Function Tag Labeling
Weiwei Sun | Zhifang Sui
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf bib
Chinese Semantic Role Labeling with Shallow Parsing
Weiwei Sun | Zhifang Sui | Meng Wang | Xin Wang
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Prediction of Maximal Projection for Semantic Role Labeling
Weiwei Sun | Zhifang Sui | Haifeng Wang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
The Integration of Dependency Relation Classification and Semantic Role Labeling Using Bilayer Maximum Entropy Markov Models
Weiwei Sun | Hongzhan Li | Zhifang Sui
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

2006

pdf bib
A Study on Terminology Extraction Based on Classified Corpora
Yirong Chen | Qin Lu | Wenjie Li | Zhifang Sui | Luning Ji
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Algorithms for automatic term extraction in a specific domain should consider at least two issues, namely Unithood and Termhood (Kageura, 1996). Unithood refers to the degree of a string to occur as a word or a phrase. Termhood (Chen Yirong, 2005) refers to the degree of a word or a phrase to occur as a domain specific concept. Unlike unithood, study on termhood is not yet widely reported. In classified corpora, the class information provides the cue to the nature of data and can be used in termhood calculation. Three algorithms are provided and evaluated to investigate termhood based on classified corpora. The three algorithms are based on lexicon set computing, term frequency and document frequency, and the strength of the relation between a term and its document class respectively. Our objective is to investigate the effects of these different termhood measurement features. After evaluation, we can find which features are more effective and also, how we can improve these different features to achieve the best performance. Preliminary results show that the first measure can effectively filter out independent terms or terms of general use.

2005

pdf bib
Domain Knowledge Engineering Based on Encyclopedias and the Web Text
Zhifang Sui | Gaoying Cui | Wansong Ding | Qinlong Zhang
Proceedings of the Fifth Workshop on Asian Language Resources (ALR-05) and First Symposium on Asian Language Resources Network (ALRN)

2000

pdf bib
An Information-Theory-Based Feature Type Analysis for the Modeling of Statistical Parsing
Zhifang Sui | Jun Zhao | Dekai Wu
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

1999

pdf bib
An Information-Theoretic Empirical Analysis of Dependency-Based Feature Types for Word Prediction Models
Dekai Wu | Jun Zhao | Zhifang Sui
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora