Zhiyuan Liu


2022

pdf bib
Going “Deeper”: Structured Sememe Prediction via Transformer with Tree Attention
Yining Ye | Fanchao Qi | Zhiyuan Liu | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2022

Sememe knowledge bases (SKBs), which annotate words with the smallest semantic units (i.e., sememes), have proven beneficial to many NLP tasks. Building an SKB is very time-consuming and labor-intensive. Therefore, some studies have tried to automate the building process by predicting sememes for the unannotated words. However, all existing sememe prediction studies ignore the hierarchical structures of sememes, which are important in the sememe-based semantic description system. In this work, we tackle the structured sememe prediction problem for the first time, which is aimed at predicting a sememe tree with hierarchical structures rather than a set of sememes. We design a sememe tree generation model based on Transformer with adjusted attention mechanism, which shows its superiority over the baselines in experiments. We also conduct a series of quantitative and qualitative analyses of the effectiveness of our model. All the code and data of this paper are available at https://github.com/thunlp/STG.

pdf bib
Sememe Prediction for BabelNet Synsets using Multilingual and Multimodal Information
Fanchao Qi | Chuancheng Lv | Zhiyuan Liu | Xiaojun Meng | Maosong Sun | Hai-Tao Zheng
Findings of the Association for Computational Linguistics: ACL 2022

In linguistics, a sememe is defined as the minimum semantic unit of languages. Sememe knowledge bases (KBs), which are built by manually annotating words with sememes, have been successfully applied to various NLP tasks. However, existing sememe KBs only cover a few languages, which hinders the wide utilization of sememes. To address this issue, the task of sememe prediction for BabelNet synsets (SPBS) is presented, aiming to build a multilingual sememe KB based on BabelNet, a multilingual encyclopedia dictionary. By automatically predicting sememes for a BabelNet synset, the words in many languages in the synset would obtain sememe annotations simultaneously. However, previous SPBS methods have not taken full advantage of the abundant information in BabelNet. In this paper, we utilize the multilingual synonyms, multilingual glosses and images in BabelNet for SPBS. We design a multimodal information fusion model to encode and combine this information for sememe prediction. Experimental results show the substantial outperformance of our model over previous methods (about 10 MAP and F1 scores). All the code and data of this paper can be obtained at https://github.com/thunlp/MSGI.

pdf bib
LEVEN: A Large-Scale Chinese Legal Event Detection Dataset
Feng Yao | Chaojun Xiao | Xiaozhi Wang | Zhiyuan Liu | Lei Hou | Cunchao Tu | Juanzi Li | Yun Liu | Weixing Shen | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2022

Recognizing facts is the most fundamental step in making judgments, hence detecting events in the legal documents is important to legal case analysis tasks. However, existing Legal Event Detection (LED) datasets only concern incomprehensive event types and have limited annotated data, which restricts the development of LED methods and their downstream applications. To alleviate these issues, we present LEVEN a large-scale Chinese LEgal eVENt detection dataset, with 8,116 legal documents and 150,977 human-annotated event mentions in 108 event types. Not only charge-related events, LEVEN also covers general events, which are critical for legal case understanding but neglected in existing LED datasets. To our knowledge, LEVEN is the largest LED dataset and has dozens of times the data scale of others, which shall significantly promote the training and evaluation of LED methods. The results of extensive experiments indicate that LED is challenging and needs further effort. Moreover, we simply utilize legal events as side information to promote downstream applications. The method achieves improvements of average 2.2 points precision in low-resource judgment prediction, and 1.5 points mean average precision in unsupervised case retrieval, which suggests the fundamentality of LED. The source code and dataset can be obtained from https://github.com/thunlp/LEVEN.

pdf bib
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang | Yankai Lin | Zhiyuan Liu | Peng Li | Maosong Sun | Jie Zhou
Findings of the Association for Computational Linguistics: ACL 2022

Recent work has shown that feed-forward networks (FFNs) in pre-trained Transformers are a key component, storing various linguistic and factual knowledge. However, the computational patterns of FFNs are still unclear. In this work, we study the computational patterns of FFNs and observe that most inputs only activate a tiny ratio of neurons of FFNs. This phenomenon is similar to the sparsity of the human brain, which drives research on functional partitions of the human brain. To verify whether functional partitions also emerge in FFNs, we propose to convert a model into its MoE version with the same parameters, namely MoEfication. Specifically, MoEfication consists of two phases: (1) splitting the parameters of FFNs into multiple functional partitions as experts, and (2) building expert routers to decide which experts will be used for each input. Experimental results show that MoEfication can conditionally use 10% to 30% of FFN parameters while maintaining over 95% original performance for different models on various downstream tasks. Besides, MoEfication brings two advantages: (1) it significantly reduces the FLOPS of inference, i.e., 2x speedup with 25% of FFN parameters, and (2) it provides a fine-grained perspective to study the inner mechanism of FFNs. The source code of this paper can be obtained from https://github.com/thunlp/MoEfication.

pdf bib
ELLE: Efficient Lifelong Pre-training for Emerging Data
Yujia Qin | Jiajie Zhang | Yankai Lin | Zhiyuan Liu | Peng Li | Maosong Sun | Jie Zhou
Findings of the Association for Computational Linguistics: ACL 2022

Current pre-trained language models (PLM) are typically trained with static data, ignoring that in real-world scenarios, streaming data of various sources may continuously grow. This requires PLMs to integrate the information from all the sources in a lifelong manner. Although this goal could be achieved by exhaustive pre-training on all the existing data, such a process is known to be computationally expensive. To this end, we propose ELLE, aiming at efficient lifelong pre-training for emerging data. Specifically, ELLE consists of (1) function preserved model expansion, which flexibly expands an existing PLM’s width and depth to improve the efficiency of knowledge acquisition; and (2) pre-trained domain prompts, which disentangle the versatile knowledge learned during pre-training and stimulate the proper knowledge for downstream tasks. We experiment ELLE with streaming data from 5 domains on BERT and GPT. The results show the superiority of ELLE over various lifelong learning baselines in both pre-training efficiency and downstream performances. The codes are publicly available at https://github.com/thunlp/ELLE.

pdf bib
Prompt Tuning for Discriminative Pre-trained Language Models
Yuan Yao | Bowen Dong | Ao Zhang | Zhengyan Zhang | Ruobing Xie | Zhiyuan Liu | Leyu Lin | Maosong Sun | Jianyong Wang
Findings of the Association for Computational Linguistics: ACL 2022

Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks. However, to the best of our knowledge, existing works focus on prompt-tuning generative PLMs that are pre-trained to generate target tokens, such as BERT. It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned. In this work, we present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem. Comprehensive experiments on text classification and question answering show that, compared with vanilla fine-tuning, DPT achieves significantly higher performance, and also prevents the unstable problem in tuning large PLMs in both full-set and low-resource settings.

pdf bib
Do Pre-trained Models Benefit Knowledge Graph Completion? A Reliable Evaluation and a Reasonable Approach
Xin Lv | Yankai Lin | Yixin Cao | Lei Hou | Juanzi Li | Zhiyuan Liu | Peng Li | Jie Zhou
Findings of the Association for Computational Linguistics: ACL 2022

In recent years, pre-trained language models (PLMs) have been shown to capture factual knowledge from massive texts, which encourages the proposal of PLM-based knowledge graph completion (KGC) models. However, these models are still quite behind the SOTA KGC models in terms of performance. In this work, we find two main reasons for the weak performance: (1) Inaccurate evaluation setting. The evaluation setting under the closed-world assumption (CWA) may underestimate the PLM-based KGC models since they introduce more external knowledge; (2) Inappropriate utilization of PLMs. Most PLM-based KGC models simply splice the labels of entities and relations as inputs, leading to incoherent sentences that do not take full advantage of the implicit knowledge in PLMs. To alleviate these problems, we highlight a more accurate evaluation setting under the open-world assumption (OWA), which manual checks the correctness of knowledge that is not in KGs. Moreover, motivated by prompt tuning, we propose a novel PLM-based KGC model named PKGC. The basic idea is to convert each triple and its support information into natural prompt sentences, which is further fed into PLMs for classification. Experiment results on two KGC datasets demonstrate OWA is more reliable for evaluating KGC, especially on the link prediction, and the effectiveness of our PKCG model on both CWA and OWA settings.

pdf bib
Exploring the Universal Vulnerability of Prompt-based Learning Paradigm
Lei Xu | Yangyi Chen | Ganqu Cui | Hongcheng Gao | Zhiyuan Liu
Findings of the Association for Computational Linguistics: NAACL 2022

Prompt-based learning paradigm bridges the gap between pre-training and fine-tuning, and works effectively under the few-shot setting. However, we find that this learning paradigm inherits the vulnerability from the pre-training stage, where model predictions can be misled by inserting certain triggers into the text. In this paper, we explore this universal vulnerability by either injecting backdoor triggers or searching for adversarial triggers on pre-trained language models using only plain text. In both scenarios, we demonstrate that our triggers can totally control or severely decrease the performance of prompt-based models fine-tuned on arbitrary downstream tasks, reflecting the universal vulnerability of the prompt-based learning paradigm. Further experiments show that adversarial triggers have good transferability among language models. We also find conventional fine-tuning models are not vulnerable to adversarial triggers constructed from pre-trained language models. We conclude by proposing a potential solution to mitigate our attack methods. Code and data are publicly available.

pdf bib
Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models
Zichun Yu | Tianyu Gao | Zhengyan Zhang | Yankai Lin | Zhiyuan Liu | Maosong Sun | Jie Zhou
Proceedings of the 29th International Conference on Computational Linguistics

Prompting, which casts downstream applications as language modeling tasks, has shown to be sample efficient compared to standard fine-tuning with pre-trained models. However, one pitfall of prompting is the need of manually-designed patterns, whose outcome can be unintuitive and requires large validation sets to tune. To tackle the challenge, we propose AutoSeq, a fully automatic prompting method: (1) We adopt natural language prompts on sequence-to-sequence models, enabling free-form generation and larger label search space; (2) We propose label sequences – phrases with indefinite lengths to verbalize the labels – which eliminate the need of manual templates and are more expressive than single label words; (3) We use beam search to automatically generate a large amount of label sequence candidates and propose contrastive re-ranking to get the best combinations. AutoSeq significantly outperforms other no-manual-design methods, such as soft prompt tuning, adapter tuning, and automatic search on single label words; the generated label sequences are even better than curated manual ones on a variety of tasks. Our method reveals the potential of sequence-to-sequence models in few-shot learning and sheds light on a path to generic and automatic prompting. The source code of this paper can be obtained from https://github.com/thunlp/Seq2Seq-Prompt.

pdf bib
LEGO-ABSA: A Prompt-based Task Assemblable Unified Generative Framework for Multi-task Aspect-based Sentiment Analysis
Tianhao Gao | Jun Fang | Hanyu Liu | Zhiyuan Liu | Chao Liu | Pengzhang Liu | Yongjun Bao | Weipeng Yan
Proceedings of the 29th International Conference on Computational Linguistics

Aspect-based sentiment analysis (ABSA) has received increasing attention recently. ABSA can be divided into multiple tasks according to the different extracted elements. Existing generative methods usually treat the output as a whole string rather than the combination of different elements and only focus on a single task at once. This paper proposes a unified generative multi-task framework that can solve multiple ABSA tasks by controlling the type of task prompts consisting of multiple element prompts. Further, the proposed approach can train on simple tasks and transfer to difficult tasks by assembling task prompts, like assembling Lego bricks. We conduct experiments on six ABSA tasks across multiple benchmarks. Our proposed multi-task approach achieves new state-of-the-art results in almost all tasks and competitive results in task transfer scenarios.

pdf bib
Knowledge Inheritance for Pre-trained Language Models
Yujia Qin | Yankai Lin | Jing Yi | Jiajie Zhang | Xu Han | Zhengyan Zhang | Yusheng Su | Zhiyuan Liu | Peng Li | Maosong Sun | Jie Zhou
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent explorations of large-scale pre-trained language models (PLMs) have revealed the power of PLMs with huge amounts of parameters, setting off a wave of training ever-larger PLMs. However, it requires tremendous computational resources to train a large-scale PLM, which may be practically unaffordable. In addition, existing large-scale PLMs are mainly trained from scratch individually, ignoring that many well-trained PLMs are available. To this end, we explore the question how could existing PLMs benefit training large-scale PLMs in future. Specifically, we introduce a pre-training framework named “knowledge inheritance” (KI) and explore how could knowledge distillation serve as auxiliary supervision during pre-training to efficiently learn larger PLMs. Experimental results demonstrate the superiority of KI in training efficiency. We also conduct empirical analyses to explore the effects of teacher PLMs’ pre-training settings, including model architecture, pre-training data, etc. Finally, we show that KI could be applied to domain adaptation and knowledge transfer.

pdf bib
On Transferability of Prompt Tuning for Natural Language Processing
Yusheng Su | Xiaozhi Wang | Yujia Qin | Chi-Min Chan | Yankai Lin | Huadong Wang | Kaiyue Wen | Zhiyuan Liu | Peng Li | Juanzi Li | Lei Hou | Maosong Sun | Jie Zhou
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Prompt tuning (PT) is a promising parameter-efficient method to utilize extremely large pre-trained language models (PLMs), which can achieve comparable performance to full-parameter fine-tuning by only tuning a few soft prompts. However, PT requires much more training time than fine-tuning. Intuitively, knowledge transfer can help to improve the efficiency. To explore whether we can improve PT via prompt transfer, we empirically investigate the transferability of soft prompts across different downstream tasks and PLMs in this work. We find that (1) in zero-shot setting, trained soft prompts can effectively transfer to similar tasks on the same PLM and also to other PLMs with a cross-model projector trained on similar tasks; (2) when used as initialization, trained soft prompts of similar tasks and projected prompts of other PLMs can significantly accelerate training and also improve the performance of PT. Moreover, to explore what decides prompt transferability, we investigate various transferability indicators and find that the overlapping rate of activated neurons strongly reflects the transferability, which suggests how the prompts stimulate PLMs is essential. Our findings show that prompt transfer is promising for improving PT, and further research shall focus more on prompts’ stimulation to PLMs. The source code can be obtained from https://github.com/thunlp/Prompt-Transferability.

pdf bib
ProQA: Structural Prompt-based Pre-training for Unified Question Answering
Wanjun Zhong | Yifan Gao | Ning Ding | Yujia Qin | Zhiyuan Liu | Ming Zhou | Jiahai Wang | Jian Yin | Nan Duan
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Question Answering (QA) is a longstanding challenge in natural language processing. Existing QA works mostly focus on specific question types, knowledge domains, or reasoning skills. The specialty in QA research hinders systems from modeling commonalities between tasks and generalization for wider applications. To address this issue, we present ProQA, a unified QA paradigm that solves various tasks through a single model. ProQA takes a unified structural prompt as the bridge and improves the QA-centric ability by structural prompt-based pre-training. Through a structurally designed prompt-based input schema, ProQA concurrently models the knowledge generalization for all QA tasks while keeping the knowledge customization for every specific QA task. Furthermore, ProQA is pre-trained with structural prompt-formatted large-scale synthesized corpus, which empowers the model with the commonly-required QA ability. Experimental results on 11 QA benchmarks demonstrate that ProQA consistently boosts performance on both full data fine-tuning, few-shot learning, and zero-shot testing scenarios. Furthermore, ProQA exhibits strong ability in both continual learning and transfer learning by taking the advantages of the structural prompt.

pdf bib
QuoteR: A Benchmark of Quote Recommendation for Writing
Fanchao Qi | Yanhui Yang | Jing Yi | Zhili Cheng | Zhiyuan Liu | Maosong Sun
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

It is very common to use quotations (quotes) to make our writings more elegant or convincing. To help people find appropriate quotes efficiently, the task of quote recommendation is presented, aiming to recommend quotes that fit the current context of writing. There have been various quote recommendation approaches, but they are evaluated on different unpublished datasets. To facilitate the research on this task, we build a large and fully open quote recommendation dataset called QuoteR, which comprises three parts including English, standard Chinese and classical Chinese. Any part of it is larger than previous unpublished counterparts. We conduct an extensive evaluation of existing quote recommendation methods on QuoteR. Furthermore, we propose a new quote recommendation model that significantly outperforms previous methods on all three parts of QuoteR. All the code and data of this paper can be obtained at https://github.com/thunlp/QuoteR.

pdf bib
bert2BERT: Towards Reusable Pretrained Language Models
Cheng Chen | Yichun Yin | Lifeng Shang | Xin Jiang | Yujia Qin | Fengyu Wang | Zhi Wang | Xiao Chen | Zhiyuan Liu | Qun Liu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In recent years, researchers tend to pre-train ever-larger language models to explore the upper limit of deep models. However, large language model pre-training costs intensive computational resources, and most of the models are trained from scratch without reusing the existing pre-trained models, which is wasteful. In this paper, we propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model through parameter initialization and significantly improve the pre-training efficiency of the large model. Specifically, we extend the previous function-preserving method proposed in computer vision on the Transformer-based language model, and further improve it by proposing a novel method, advanced knowledge for large model’s initialization. In addition, a two-stage learning method is proposed to further accelerate the pre-training. We conduct extensive experiments on representative PLMs (e.g., BERT and GPT) and demonstrate that (1) our method can save a significant amount of training cost compared with baselines including learning from scratch, StackBERT and MSLT; (2) our method is generic and applicable to different types of pre-trained models. In particular, bert2BERT saves about 45% and 47% computational cost of pre-training BERT\rm BASE and GPT\rm BASE by reusing the models of almost their half sizes.

pdf bib
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification
Shengding Hu | Ning Ding | Huadong Wang | Zhiyuan Liu | Jingang Wang | Juanzi Li | Wei Wu | Maosong Sun
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Tuning pre-trained language models (PLMs) with task-specific prompts has been a promising approach for text classification. Particularly, previous studies suggest that prompt-tuning has remarkable superiority in the low-data scenario over the generic fine-tuning methods with extra classifiers. The core idea of prompt-tuning is to insert text pieces, i.e., template, to the input and transform a classification problem into a masked language modeling problem, where a crucial step is to construct a projection, i.e., verbalizer, between a label space and a label word space. A verbalizer is usually handcrafted or searched by gradient descent, which may lack coverage and bring considerable bias and high variances to the results. In this work, we focus on incorporating external knowledge into the verbalizer, forming a knowledgeable prompttuning (KPT), to improve and stabilize prompttuning. Specifically, we expand the label word space of the verbalizer using external knowledge bases (KBs) and refine the expanded label word space with the PLM itself before predicting with the expanded label word space. Extensive experiments on zero and few-shot text classification tasks demonstrate the effectiveness of knowledgeable prompt-tuning.

pdf bib
Cross-Lingual Contrastive Learning for Fine-Grained Entity Typing for Low-Resource Languages
Xu Han | Yuqi Luo | Weize Chen | Zhiyuan Liu | Maosong Sun | Zhou Botong | Hao Fei | Suncong Zheng
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Fine-grained entity typing (FGET) aims to classify named entity mentions into fine-grained entity types, which is meaningful for entity-related NLP tasks. For FGET, a key challenge is the low-resource problem — the complex entity type hierarchy makes it difficult to manually label data. Especially for those languages other than English, human-labeled data is extremely scarce. In this paper, we propose a cross-lingual contrastive learning framework to learn FGET models for low-resource languages. Specifically, we use multi-lingual pre-trained language models (PLMs) as the backbone to transfer the typing knowledge from high-resource languages (such as English) to low-resource languages (such as Chinese). Furthermore, we introduce entity-pair-oriented heuristic rules as well as machine translation to obtain cross-lingual distantly-supervised data, and apply cross-lingual contrastive learning on the distantly-supervised data to enhance the backbone PLMs. Experimental results show that by applying our framework, we can easily learn effective FGET models for low-resource languages, even without any language-specific human-labeled data. Our code is also available at https://github.com/thunlp/CrossET.

pdf bib
Pass off Fish Eyes for Pearls: Attacking Model Selection of Pre-trained Models
Biru Zhu | Yujia Qin | Fanchao Qi | Yangdong Deng | Zhiyuan Liu | Maosong Sun | Ming Gu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Selecting an appropriate pre-trained model (PTM) for a specific downstream task typically requires significant efforts of fine-tuning. To accelerate this process, researchers propose feature-based model selection (FMS) methods, which assess PTMs’ transferability to a specific task in a fast way without fine-tuning. In this work, we argue that current FMS methods are vulnerable, as the assessment mainly relies on the static features extracted from PTMs. However, such features are derived without training PTMs on downstream tasks, and are not necessarily reliable indicators for the PTM’s transferability. To validate our viewpoints, we design two methods to evaluate the robustness of FMS: (1) model disguise attack, which post-trains an inferior PTM with a contrastive objective, and (2) evaluation data selection, which selects a subset of the data points for FMS evaluation based on K-means clustering. Experimental results prove that both methods can successfully make FMS mistakenly judge the transferability of PTMs. Moreover, we find that these two methods can further be combined with the backdoor attack to misguide the FMS to select poisoned models. To the best of our knowledge, this is the first work to demonstrate the defects of current FMS algorithms and evaluate their potential security risks. By identifying previously unseen risks of FMS, our study indicates new directions for improving the robustness of FMS.

pdf bib
Fully Hyperbolic Neural Networks
Weize Chen | Xu Han | Yankai Lin | Hexu Zhao | Zhiyuan Liu | Peng Li | Maosong Sun | Jie Zhou
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Hyperbolic neural networks have shown great potential for modeling complex data. However, existing hyperbolic networks are not completely hyperbolic, as they encode features in the hyperbolic space yet formalize most of their operations in the tangent space (a Euclidean subspace) at the origin of the hyperbolic model. This hybrid method greatly limits the modeling ability of networks. In this paper, we propose a fully hyperbolic framework to build hyperbolic networks based on the Lorentz model by adapting the Lorentz transformations (including boost and rotation) to formalize essential operations of neural networks. Moreover, we also prove that linear transformation in tangent spaces used by existing hyperbolic networks is a relaxation of the Lorentz rotation and does not include the boost, implicitly limiting the capabilities of existing hyperbolic networks. The experimental results on four NLP tasks show that our method has better performance for building both shallow and deep networks. Our code will be released to facilitate follow-up research.

pdf bib
Prototypical Verbalizer for Prompt-based Few-shot Tuning
Ganqu Cui | Shengding Hu | Ning Ding | Longtao Huang | Zhiyuan Liu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Prompt-based tuning for pre-trained language models (PLMs) has shown its effectiveness in few-shot learning. Typically, prompt-based tuning wraps the input text into a cloze question. To make predictions, the model maps the output words to labels via a verbalizer, which is either manually designed or automatically built. However, manual verbalizers heavily depend on domain-specific prior knowledge and human efforts, while finding appropriate label words automatically still remains challenging.In this work, we propose the prototypical verbalizer (ProtoVerb) which is built directly from training data. Specifically, ProtoVerb learns prototype vectors as verbalizers by contrastive learning. In this way, the prototypes summarize training instances and are able to enclose rich class-level semantics. We conduct experiments on both topic classification and entity typing tasks, and the results demonstrate that ProtoVerb significantly outperforms current automatic verbalizers, especially when training data is extremely scarce. More surprisingly, ProtoVerb consistently boosts prompt-based tuning even on untuned PLMs, indicating an elegant non-tuning way to utilize PLMs. Our codes are avaliable at https://github.com/thunlp/OpenPrompt.

pdf bib
Program Transfer for Answering Complex Questions over Knowledge Bases
Shulin Cao | Jiaxin Shi | Zijun Yao | Xin Lv | Jifan Yu | Lei Hou | Juanzi Li | Zhiyuan Liu | Jinghui Xiao
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Program induction for answering complex questions over knowledge bases (KBs) aims to decompose a question into a multi-step program, whose execution against the KB produces the final answer. Learning to induce programs relies on a large number of parallel question-program pairs for the given KB. However, for most KBs, the gold program annotations are usually lacking, making learning difficult. In this paper, we propose the approach of program transfer, which aims to leverage the valuable program annotations on the rich-resourced KBs as external supervision signals to aid program induction for the low-resourced KBs that lack program annotations. For program transfer, we design a novel two-stage parsing framework with an efficient ontology-guided pruning strategy. First, a sketch parser translates the question into a high-level program sketch, which is the composition of functions. Second, given the question and sketch, an argument parser searches the detailed arguments from the KB for functions. During the searching, we incorporate the KB ontology to prune the search space. The experiments on ComplexWebQuestions and WebQuestionSP show that our method outperforms SOTA methods significantly, demonstrating the effectiveness of program transfer and our framework. Our codes and datasets can be obtained from https://github.com/THU-KEG/ProgramTransfer.

pdf bib
PPT: Pre-trained Prompt Tuning for Few-shot Learning
Yuxian Gu | Xu Han | Zhiyuan Liu | Minlie Huang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Prompts for pre-trained language models (PLMs) have shown remarkable performance by bridging the gap between pre-training tasks and various downstream tasks. Among these methods, prompt tuning, which freezes PLMs and only tunes soft prompts, provides an efficient and effective solution for adapting large-scale PLMs to downstream tasks. However, prompt tuning is yet to be fully explored. In our pilot experiments, we find that prompt tuning performs comparably with conventional full-model tuning when downstream data are sufficient, whereas it is much worse under few-shot learning settings, which may hinder the application of prompt tuning. We attribute this low performance to the manner of initializing soft prompts. Therefore, in this work, we propose to pre-train prompts by adding soft prompts into the pre-training stage to obtain a better initialization. We name this Pre-trained Prompt Tuning framework “PPT”. To ensure the generalization of PPT, we formulate similar classification tasks into a unified task form and pre-train soft prompts for this unified task. Extensive experiments show that tuning pre-trained prompts for downstream tasks can reach or even outperform full-model fine-tuning under both full-data and few-shot settings. Our approach is effective and efficient for using large-scale PLMs in practice.

pdf bib
A Simple but Effective Pluggable Entity Lookup Table for Pre-trained Language Models
Deming Ye | Yankai Lin | Peng Li | Maosong Sun | Zhiyuan Liu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Pre-trained language models (PLMs) cannot well recall rich factual knowledge of entities exhibited in large-scale corpora, especially those rare entities. In this paper, we propose to build a simple but effective Pluggable Entity Lookup Table (PELT) on demand by aggregating the entity’s output representations of multiple occurrences in the corpora. PELT can be compatibly plugged as inputs to infuse supplemental entity knowledge into PLMs. Compared to previous knowledge-enhanced PLMs, PELT only requires 0.2%-5% pre-computation with capability of acquiring knowledge from out-of-domain corpora for domain adaptation scenario. The experiments on knowledge-related tasks demonstrate that our method, PELT, can flexibly and effectively transfer entity knowledge from related corpora into PLMs with different architectures. Our code and models are publicly available at https://github.com/thunlp/PELT

pdf bib
OpenPrompt: An Open-source Framework for Prompt-learning
Ning Ding | Shengding Hu | Weilin Zhao | Yulin Chen | Zhiyuan Liu | Haitao Zheng | Maosong Sun
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Prompt-learning has become a new paradigm in modern natural language processing, which directly adapts pre-trained language models (PLMs) to cloze-style prediction, autoregressive modeling, or sequence to sequence generation, resulting in promising performances on various tasks. However, no standard implementation framework of prompt-learning is proposed yet, and most existing prompt- learning codebases, often unregulated, only provide limited implementations for specific scenarios. Since there are many details such as templating strategy, initializing strategy, verbalizing strategy, etc., that need to be considered in prompt-learning, practitioners face impediments to quickly adapting the de-sired prompt learning methods to their applications. In this paper, we present Open- Prompt, a unified easy-to-use toolkit to conduct prompt-learning over PLMs. OpenPrompt is a research-friendly framework that is equipped with efficiency, modularity, and extendibility, and its combinability allows the freedom to combine different PLMs, task for- mats, and prompting modules in a unified paradigm. Users could expediently deploy prompt-learning frameworks and evaluate the generalization of them on different NLP tasks without constraints.

pdf bib
BMInf: An Efficient Toolkit for Big Model Inference and Tuning
Xu Han | Guoyang Zeng | Weilin Zhao | Zhiyuan Liu | Zhengyan Zhang | Jie Zhou | Jun Zhang | Jia Chao | Maosong Sun
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

In recent years, large-scale pre-trained language models (PLMs) containing billions of parameters have achieved promising results on various NLP tasks. Although we can pre-train these big models by stacking computing clusters at any cost, it is impractical to use such huge computing resources to apply big models for each downstream task. To address the computation bottleneck encountered in deploying big models in real-world scenarios, we introduce an open-source toolkit for big model inference and tuning (BMInf), which can support big model inference and tuning at extremely low computation cost. More specifically, at the algorithm level, we introduce model quantization and parameter-efficient tuning for efficient model inference and tuning. At the implementation level, we apply model offloading, model checkpointing, and CPU-GPU scheduling optimization to further reduce the computation and memory cost of big models. Based on above efforts, we can efficiently perform big model inference and tuning with a single GPU (even a consumer-level GPU like GTX 1060) instead of computing clusters, which is difficult for existing distributed learning toolkits for PLMs. BMInf is publicly released at https://github.com/OpenBMB/BMInf.

2021

pdf bib
CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild
Yuan Yao | Jiaju Du | Yankai Lin | Peng Li | Zhiyuan Liu | Jie Zhou | Maosong Sun
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Existing relation extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents. However, a large quantity of relational facts in knowledge bases can only be inferred across documents in practice. In this work, we present the problem of cross-document RE, making an initial step towards knowledge acquisition in the wild. To facilitate the research, we construct the first human-annotated cross-document RE dataset CodRED. Compared to existing RE datasets, CodRED presents two key challenges: Given two entities, (1) it requires finding the relevant documents that can provide clues for identifying their relations; (2) it requires reasoning over multiple documents to extract the relational facts. We conduct comprehensive experiments to show that CodRED is challenging to existing RE methods including strong BERT-based models.

pdf bib
Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer
Fanchao Qi | Yangyi Chen | Xurui Zhang | Mukai Li | Zhiyuan Liu | Maosong Sun
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Adversarial attacks and backdoor attacks are two common security threats that hang over deep learning. Both of them harness task-irrelevant features of data in their implementation. Text style is a feature that is naturally irrelevant to most NLP tasks, and thus suitable for adversarial and backdoor attacks. In this paper, we make the first attempt to conduct adversarial and backdoor attacks based on text style transfer, which is aimed at altering the style of a sentence while preserving its meaning. We design an adversarial attack method and a backdoor attack method, and conduct extensive experiments to evaluate them. Experimental results show that popular NLP models are vulnerable to both adversarial and backdoor attacks based on text style transfer—the attack success rates can exceed 90% without much effort. It reflects the limited ability of NLP models to handle the feature of text style that has not been widely realized. In addition, the style transfer-based adversarial and backdoor attack methods show superiority to baselines in many aspects. All the code and data of this paper can be obtained at https://github.com/thunlp/StyleAttack.

pdf bib
Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning Interpretability
Xin Lv | Yixin Cao | Lei Hou | Juanzi Li | Zhiyuan Liu | Yichi Zhang | Zelin Dai
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Multi-hop reasoning has been widely studied in recent years to obtain more interpretable link prediction. However, we find in experiments that many paths given by these models are actually unreasonable, while little work has been done on interpretability evaluation for them. In this paper, we propose a unified framework to quantitatively evaluate the interpretability of multi-hop reasoning models so as to advance their development. In specific, we define three metrics, including path recall, local interpretability, and global interpretability for evaluation, and design an approximate strategy to calculate these metrics using the interpretability scores of rules. We manually annotate all possible rules and establish a benchmark. In experiments, we verify the effectiveness of our benchmark. Besides, we run nine representative baselines on our benchmark, and the experimental results show that the interpretability of current multi-hop reasoning models is less satisfactory and is 51.7% lower than the upper bound given by our benchmark. Moreover, the rule-based models outperform the multi-hop reasoning models in terms of performance and interpretability, which points to a direction for future research, i.e., how to better incorporate rule information into the multi-hop reasoning model. We will publish our codes and datasets upon acceptance.

pdf bib
ONION: A Simple and Effective Defense Against Textual Backdoor Attacks
Fanchao Qi | Yangyi Chen | Mukai Li | Yuan Yao | Zhiyuan Liu | Maosong Sun
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Backdoor attacks are a kind of emergent training-time threat to deep neural networks (DNNs). They can manipulate the output of DNNs and possess high insidiousness. In the field of natural language processing, some attack methods have been proposed and achieve very high attack success rates on multiple popular models. Nevertheless, there are few studies on defending against textual backdoor attacks. In this paper, we propose a simple and effective textual backdoor defense named ONION, which is based on outlier word detection and, to the best of our knowledge, is the first method that can handle all the textual backdoor attack situations. Experiments demonstrate the effectiveness of our model in defending BiLSTM and BERT against five different backdoor attacks. All the code and data of this paper can be obtained at https://github.com/thunlp/ONION.

pdf bib
Manual Evaluation Matters: Reviewing Test Protocols of Distantly Supervised Relation Extraction
Tianyu Gao | Xu Han | Yuzhuo Bai | Keyue Qiu | Zhiyu Xie | Yankai Lin | Zhiyuan Liu | Peng Li | Maosong Sun | Jie Zhou
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Better Robustness by More Coverage: Adversarial and Mixup Data Augmentation for Robust Finetuning
Chenglei Si | Zhengyan Zhang | Fanchao Qi | Zhiyuan Liu | Yasheng Wang | Qun Liu | Maosong Sun
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
KACC: A Multi-task Benchmark for Knowledge Abstraction, Concretization and Completion
Jie Zhou | Shengding Hu | Xin Lv | Cheng Yang | Zhiyuan Liu | Wei Xu | Jie Jiang | Juanzi Li | Maosong Sun
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Automatic Construction of Sememe Knowledge Bases via Dictionaries
Fanchao Qi | Yangyi Chen | Fengyu Wang | Zhiyuan Liu | Xiao Chen | Maosong Sun
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
TIAGE: A Benchmark for Topic-Shift Aware Dialog Modeling
Huiyuan Xie | Zhenghao Liu | Chenyan Xiong | Zhiyuan Liu | Ann Copestake
Findings of the Association for Computational Linguistics: EMNLP 2021

Human conversations naturally evolve around different topics and fluently move between them. In research on dialog systems, the ability to actively and smoothly transition to new topics is often ignored. In this paper we introduce TIAGE, a new topic-shift aware dialog benchmark constructed utilizing human annotations on topic shifts. Based on TIAGE, we introduce three tasks to investigate different scenarios of topic-shift modeling in dialog settings: topic-shift detection, topic-shift triggered response generation and topic-aware dialog generation. Experiments on these tasks show that the topic-shift signals in TIAGE are useful for topic-shift response generation. On the other hand, dialog systems still struggle to decide when to change topic. This indicates further research is needed in topic-shift aware dialog modeling.

pdf bib
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
Xiaozhi Wang | Tianyu Gao | Zhaocheng Zhu | Zhengyan Zhang | Zhiyuan Liu | Juanzi Li | Jian Tang
Transactions of the Association for Computational Linguistics, Volume 9

Abstract Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text. In contrast, knowledge embedding (KE) methods can effectively represent the relational facts in knowledge graphs (KGs) with informative entity embeddings, but conventional KE models cannot take full advantage of the abundant textual information. In this paper, we propose a unified model for Knowledge Embedding and Pre-trained LanguagERepresentation (KEPLER), which can not only better integrate factual knowledge into PLMs but also produce effective text-enhanced KE with the strong PLMs. In KEPLER, we encode textual entity descriptions with a PLM as their embeddings, and then jointly optimize the KE and language modeling objectives. Experimental results show that KEPLER achieves state-of-the-art performances on various NLP tasks, and also works remarkably well as an inductive KE model on KG link prediction. Furthermore, for pre-training and evaluating KEPLER, we construct Wikidata5M1 , a large-scale KG dataset with aligned entity descriptions, and benchmark state-of-the-art KE methods on it. It shall serve as a new KE benchmark and facilitate the research on large KG, inductive KE, and KG with text. The source code can be obtained from https://github.com/THU-KEG/KEPLER.

pdf bib
Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger
Fanchao Qi | Mukai Li | Yangyi Chen | Zhengyan Zhang | Zhiyuan Liu | Yasheng Wang | Maosong Sun
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Backdoor attacks are a kind of insidious security threat against machine learning models. After being injected with a backdoor in training, the victim model will produce adversary-specified outputs on the inputs embedded with predesigned triggers but behave properly on normal inputs during inference. As a sort of emergent attack, backdoor attacks in natural language processing (NLP) are investigated insufficiently. As far as we know, almost all existing textual backdoor attack methods insert additional contents into normal samples as triggers, which causes the trigger-embedded samples to be detected and the backdoor attacks to be blocked without much effort. In this paper, we propose to use the syntactic structure as the trigger in textual backdoor attacks. We conduct extensive experiments to demonstrate that the syntactic trigger-based attack method can achieve comparable attack performance (almost 100% success rate) to the insertion-based methods but possesses much higher invisibility and stronger resistance to defenses. These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks. All the code and data of this paper can be obtained at https://github.com/thunlp/HiddenKiller.

pdf bib
Few-NERD: A Few-shot Named Entity Recognition Dataset
Ning Ding | Guangwei Xu | Yulin Chen | Xiaobin Wang | Xu Han | Pengjun Xie | Haitao Zheng | Zhiyuan Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recently, considerable literature has grown up around the theme of few-shot named entity recognition (NER), but little published benchmark data specifically focused on the practical and challenging task. Current approaches collect existing supervised NER datasets and re-organize them to the few-shot setting for empirical study. These strategies conventionally aim to recognize coarse-grained entity types with few examples, while in practice, most unseen entity types are fine-grained. In this paper, we present Few-NERD, a large-scale human-annotated few-shot NER dataset with a hierarchy of 8 coarse-grained and 66 fine-grained entity types. Few-NERD consists of 188,238 sentences from Wikipedia, 4,601,160 words are included and each is annotated as context or a part of the two-level entity type. To the best of our knowledge, this is the first few-shot NER dataset and the largest human-crafted NER dataset. We construct benchmark tasks with different emphases to comprehensively assess the generalization capability of models. Extensive empirical results and analysis show that Few-NERD is challenging and the problem requires further research. The Few-NERD dataset and the baselines will be publicly available to facilitate the research on this problem.

pdf bib
ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning
Yujia Qin | Yankai Lin | Ryuichi Takanobu | Zhiyuan Liu | Peng Li | Heng Ji | Minlie Huang | Maosong Sun | Jie Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Pre-trained Language Models (PLMs) have shown superior performance on various downstream Natural Language Processing (NLP) tasks. However, conventional pre-training objectives do not explicitly model relational facts in text, which are crucial for textual understanding. To address this issue, we propose a novel contrastive learning framework ERICA to obtain a deep understanding of the entities and their relations in text. Specifically, we define two novel pre-training tasks to better understand entities and relations: (1) the entity discrimination task to distinguish which tail entity can be inferred by the given head entity and relation; (2) the relation discrimination task to distinguish whether two relations are close or not semantically, which involves complex relational reasoning. Experimental results demonstrate that ERICA can improve typical PLMs (BERT and RoBERTa) on several language understanding tasks, including relation extraction, entity typing and question answering, especially under low-resource settings.

pdf bib
Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution
Fanchao Qi | Yuan Yao | Sophia Xu | Zhiyuan Liu | Maosong Sun
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks. Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated, presenting serious security threats to real-world applications. Since existing textual backdoor attacks pay little attention to the invisibility of backdoors, they can be easily detected and blocked. In this work, we present invisible backdoors that are activated by a learnable combination of word substitution. We show that NLP models can be injected with backdoors that lead to a nearly 100% attack success rate, whereas being highly invisible to existing defense strategies and even human inspections. The results raise a serious alarm to the security of NLP models, which requires further research to be resolved. All the data and code of this paper are released at https://github.com/thunlp/BkdAtk-LWS.

pdf bib
Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision
Si Sun | Yingzhuo Qian | Zhenghao Liu | Chenyan Xiong | Kaitao Zhang | Jie Bao | Zhiyuan Liu | Paul Bennett
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The effectiveness of Neural Information Retrieval (Neu-IR) often depends on a large scale of in-domain relevance training signals, which are not always available in real-world ranking scenarios. To democratize the benefits of Neu-IR, this paper presents MetaAdaptRank, a domain adaptive learning method that generalizes Neu-IR models from label-rich source domains to few-shot target domains. Drawing on source-domain massive relevance supervision, MetaAdaptRank contrastively synthesizes a large number of weak supervision signals for target domains and meta-learns to reweight these synthetic “weak” data based on their benefits to the target-domain ranking accuracy of Neu-IR models. Experiments on three TREC benchmarks in the web, news, and biomedical domains show that MetaAdaptRank significantly improves the few-shot ranking accuracy of Neu-IR models. Further analyses indicate that MetaAdaptRank thrives from both its contrastive weak data synthesis and meta-reweighted data selection. The code and data of this paper can be obtained from https://github.com/thunlp/MetaAdaptRank.

pdf bib
CLEVE: Contrastive Pre-training for Event Extraction
Ziqi Wang | Xiaozhi Wang | Xu Han | Yankai Lin | Lei Hou | Zhiyuan Liu | Peng Li | Juanzi Li | Jie Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Event extraction (EE) has considerably benefited from pre-trained language models (PLMs) by fine-tuning. However, existing pre-training methods have not involved modeling event characteristics, resulting in the developed EE models cannot take full advantage of large-scale unsupervised data. To this end, we propose CLEVE, a contrastive pre-training framework for EE to better learn event knowledge from large unsupervised data and their semantic structures (e.g. AMR) obtained with automatic parsers. CLEVE contains a text encoder to learn event semantics and a graph encoder to learn event structures respectively. Specifically, the text encoder learns event semantic representations by self-supervised contrastive learning to represent the words of the same events closer than those unrelated words; the graph encoder learns event structure representations by graph contrastive pre-training on parsed event-related semantic structures. The two complementary representations then work together to improve both the conventional supervised EE and the unsupervised “liberal” EE, which requires jointly extracting events and discovering event schemata without any annotated data. Experiments on ACE 2005 and MAVEN datasets show that CLEVE achieves significant improvements, especially in the challenging unsupervised setting. The source code and pre-trained checkpoints can be obtained from https://github.com/THU-KEG/CLEVE.

pdf bib
OpenAttack: An Open-source Textual Adversarial Attack Toolkit
Guoyang Zeng | Fanchao Qi | Qianrui Zhou | Tingji Zhang | Zixian Ma | Bairu Hou | Yuan Zang | Zhiyuan Liu | Maosong Sun
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

Textual adversarial attacking has received wide and increasing attention in recent years. Various attack models have been proposed, which are enormously distinct and implemented with different programming frameworks and settings. These facts hinder quick utilization and fair comparison of attack models. In this paper, we present an open-source textual adversarial attack toolkit named OpenAttack to solve these issues. Compared with existing other textual adversarial attack toolkits, OpenAttack has its unique strengths in support for all attack types, multilinguality, and parallel processing. Currently, OpenAttack includes 15 typical attack models that cover all attack types. Its highly inclusive modular design not only supports quick utilization of existing attack models, but also enables great flexibility and extensibility. OpenAttack has broad uses including comparing and evaluating attack models, measuring robustness of a model, assisting in developing new attack models, and adversarial training. Source code and documentation can be obtained at https://github.com/thunlp/OpenAttack.

pdf bib
Open Hierarchical Relation Extraction
Kai Zhang | Yuan Yao | Ruobing Xie | Xu Han | Zhiyuan Liu | Fen Lin | Leyu Lin | Maosong Sun
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Open relation extraction (OpenRE) aims to extract novel relation types from open-domain corpora, which plays an important role in completing the relation schemes of knowledge bases (KBs). Most OpenRE methods cast different relation types in isolation without considering their hierarchical dependency. We argue that OpenRE is inherently in close connection with relation hierarchies. To establish the bidirectional connections between OpenRE and relation hierarchy, we propose the task of open hierarchical relation extraction and present a novel OHRE framework for the task. We propose a dynamic hierarchical triplet objective and hierarchical curriculum training paradigm, to effectively integrate hierarchy information into relation representations for better novel relation extraction. We also present a top-down hierarchy expansion algorithm to add the extracted relations into existing hierarchies with reasonable interpretability. Comprehensive experiments show that OHRE outperforms state-of-the-art models by a large margin on both relation clustering and hierarchy expansion.

2020

pdf bib
Expertise Style Transfer: A New Task Towards Better Communication between Experts and Laymen
Yixin Cao | Ruihao Shui | Liangming Pan | Min-Yen Kan | Zhiyuan Liu | Tat-Seng Chua
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The curse of knowledge can impede communication between experts and laymen. We propose a new task of expertise style transfer and contribute a manually annotated dataset with the goal of alleviating such cognitive biases. Solving this task not only simplifies the professional language, but also improves the accuracy and expertise level of laymen descriptions using simple words. This is a challenging task, unaddressed in previous work, as it requires the models to have expert intelligence in order to modify text with a deep understanding of domain knowledge and structures. We establish the benchmark performance of five state-of-the-art models for style transfer and text simplification. The results demonstrate a significant gap between machine and human performance. We also discuss the challenges of automatic evaluation, to provide insights into future research directions. The dataset is publicly available at https://srhthu.github.io/expertise-style-transfer/.

pdf bib
Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs
Houyu Zhang | Zhenghao Liu | Chenyan Xiong | Zhiyuan Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Human conversations naturally evolve around related concepts and hop to distant concepts. This paper presents a new conversation generation model, ConceptFlow, which leverages commonsense knowledge graphs to explicitly model conversation flows. By grounding conversations to the concept space, ConceptFlow represents the potential conversation flow as traverses in the concept space along commonsense relations. The traverse is guided by graph attentions in the concept graph, moving towards more meaningful directions in the concept space, in order to generate more semantic and informative responses. Experiments on Reddit conversations demonstrate ConceptFlow’s effectiveness over previous knowledge-aware conversation models and GPT-2 based models while using 70% fewer parameters, confirming the advantage of explicit modeling conversation structures. All source codes of this work are available at https://github.com/thunlp/ConceptFlow.

pdf bib
MOOCCube: A Large-scale Data Repository for NLP Applications in MOOCs
Jifan Yu | Gan Luo | Tong Xiao | Qingyang Zhong | Yuquan Wang | Wenzheng Feng | Junyi Luo | Chenyu Wang | Lei Hou | Juanzi Li | Zhiyuan Liu | Jie Tang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The prosperity of Massive Open Online Courses (MOOCs) provides fodder for many NLP and AI research for education applications, e.g., course concept extraction, prerequisite relation discovery, etc. However, the publicly available datasets of MOOC are limited in size with few types of data, which hinders advanced models and novel attempts in related topics. Therefore, we present MOOCCube, a large-scale data repository of over 700 MOOC courses, 100k concepts, 8 million student behaviors with an external resource. Moreover, we conduct a prerequisite discovery task as an example application to show the potential of MOOCCube in facilitating relevant research. The data repository is now available at http://moocdata.cn/data/MOOCCube.

pdf bib
How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence
Haoxi Zhong | Chaojun Xiao | Cunchao Tu | Tianyang Zhang | Zhiyuan Liu | Maosong Sun
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain. In recent years, LegalAI has drawn increasing attention rapidly from both AI researchers and legal professionals, as LegalAI is beneficial to the legal system for liberating legal professionals from a maze of paperwork. Legal professionals often think about how to solve tasks from rule-based and symbol-based methods, while NLP researchers concentrate more on data-driven and embedding methods. In this paper, we introduce the history, the current state, and the future directions of research in LegalAI. We illustrate the tasks from the perspectives of legal professionals and NLP researchers and show several representative applications in LegalAI. We conduct experiments and provide an in-depth analysis of the advantages and disadvantages of existing works to explore possible future directions. You can find the implementation of our work from https://github.com/thunlp/CLAIM.

pdf bib
Word-level Textual Adversarial Attacking as Combinatorial Optimization
Yuan Zang | Fanchao Qi | Chenghao Yang | Zhiyuan Liu | Meng Zhang | Qun Liu | Maosong Sun
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Adversarial attacks are carried out to reveal the vulnerability of deep neural networks. Textual adversarial attacking is challenging because text is discrete and a small perturbation can bring significant change to the original input. Word-level attacking, which can be regarded as a combinatorial optimization problem, is a well-studied class of textual attack methods. However, existing word-level attack models are far from perfect, largely because unsuitable search space reduction methods and inefficient optimization algorithms are employed. In this paper, we propose a novel attack model, which incorporates the sememe-based word substitution method and particle swarm optimization-based search algorithm to solve the two problems separately. We conduct exhaustive experiments to evaluate our attack model by attacking BiLSTM and BERT on three benchmark datasets. Experimental results demonstrate that our model consistently achieves much higher attack success rates and crafts more high-quality adversarial examples as compared to baseline methods. Also, further experiments show our model has higher transferability and can bring more robustness enhancement to victim models by adversarial training. All the code and data of this paper can be obtained on https://github.com/thunlp/SememePSO-Attack.

pdf bib
Continual Relation Learning via Episodic Memory Activation and Reconsolidation
Xu Han | Yi Dai | Tianyu Gao | Yankai Lin | Zhiyuan Liu | Peng Li | Maosong Sun | Jie Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Continual relation learning aims to continually train a model on new data to learn incessantly emerging novel relations while avoiding catastrophically forgetting old relations. Some pioneering work has proved that storing a handful of historical relation examples in episodic memory and replaying them in subsequent training is an effective solution for such a challenging problem. However, these memory-based methods usually suffer from overfitting the few memorized examples of old relations, which may gradually cause inevitable confusion among existing relations. Inspired by the mechanism in human long-term memory formation, we introduce episodic memory activation and reconsolidation (EMAR) to continual relation learning. Every time neural models are activated to learn both new and memorized data, EMAR utilizes relation prototypes for memory reconsolidation exercise to keep a stable understanding of old relations. The experimental results show that EMAR could get rid of catastrophically forgetting old relations and outperform the state-of-the-art continual learning models.

pdf bib
Fine-grained Fact Verification with Kernel Graph Attention Network
Zhenghao Liu | Chenyan Xiong | Maosong Sun | Zhiyuan Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Fact Verification requires fine-grained natural language inference capability that finds subtle clues to identify the syntactical and semantically correct but not well-supported claims. This paper presents Kernel Graph Attention Network (KGAT), which conducts more fine-grained fact verification with kernel-based attentions. Given a claim and a set of potential evidence sentences that form an evidence graph, KGAT introduces node kernels, which better measure the importance of the evidence node, and edge kernels, which conduct fine-grained evidence propagation in the graph, into Graph Attention Networks for more accurate fact verification. KGAT achieves a 70.38% FEVER score and significantly outperforms existing fact verification models on FEVER, a large-scale benchmark for fact verification. Our analyses illustrate that, compared to dot-product attentions, the kernel-based attention concentrates more on relevant evidence sentences and meaningful clues in the evidence graph, which is the main source of KGAT’s effectiveness. All source codes of this work are available at https://github.com/thunlp/KernelGAT.

pdf bib
Enhancing Transformer with Sememe Knowledge
Yuhui Zhang | Chenghao Yang | Zhengping Zhou | Zhiyuan Liu
Proceedings of the 5th Workshop on Representation Learning for NLP

While large-scale pretraining has achieved great success in many NLP tasks, it has not been fully studied whether external linguistic knowledge can improve data-driven models. In this work, we introduce sememe knowledge into Transformer and propose three sememe-enhanced Transformer models. Sememes, by linguistic definition, are the minimum semantic units of language, which can well represent implicit semantic meanings behind words. Our experiments demonstrate that introducing sememe knowledge into Transformer can consistently improve language modeling and downstream tasks. The adversarial test further demonstrates that sememe knowledge can substantially improve model robustness.

pdf bib
Meta-Information Guided Meta-Learning for Few-Shot Relation Classification
Bowen Dong | Yuan Yao | Ruobing Xie | Tianyu Gao | Xu Han | Zhiyuan Liu | Fen Lin | Leyu Lin | Maosong Sun
Proceedings of the 28th International Conference on Computational Linguistics

Few-shot classification requires classifiers to adapt to new classes with only a few training instances. State-of-the-art meta-learning approaches such as MAML learn how to initialize and fast adapt parameters from limited instances, which have shown promising results in few-shot classification. However, existing meta-learning models solely rely on implicit instance-based statistics, and thus suffer from instance unreliability and weak interpretability. To solve this problem, we propose a novel meta-information guided meta-learning (MIML) framework, where semantic concepts of classes provide strong guidance for meta-learning in both initialization and adaptation. In effect, our model can establish connections between instance-based information and semantic-based information, which enables more effective initialization and faster adaptation. Comprehensive experimental results on few-shot relation classification demonstrate the effectiveness of the proposed framework. Notably, MIML achieves comparable or superior performance to humans with only one shot on FewRel evaluation.

pdf bib
Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet
Bairu Hou | Fanchao Qi | Yuan Zang | Xurui Zhang | Zhiyuan Liu | Maosong Sun
Proceedings of the 28th International Conference on Computational Linguistics

Word sense disambiguation (WSD) is a fundamental natural language processing task. Unsupervised knowledge-based WSD only relies on a lexical knowledge base as the sense inventory and has wider practical use than supervised WSD that requires a mass of sense-annotated data. HowNet is the most widely used lexical knowledge base in Chinese WSD. Because of its uniqueness, however, most of existing unsupervised WSD methods cannot work for HowNet-based WSD, and the tailor-made methods have not obtained satisfying results. In this paper, we propose a new unsupervised method for HowNet-based Chinese WSD, which exploits the masked language model task of pre-trained language models. In experiments, considering existing evaluation dataset is small and out-of-date, we build a new and larger HowNet-based WSD dataset. Experimental results demonstrate that our model achieves significantly better performance than all the baseline methods. All the code and data of this paper are available at https://github.com/thunlp/SememeWSD.

pdf bib
Adapting Open Domain Fact Extraction and Verification to COVID-FACT through In-Domain Language Modeling
Zhenghao Liu | Chenyan Xiong | Zhuyun Dai | Si Sun | Maosong Sun | Zhiyuan Liu
Findings of the Association for Computational Linguistics: EMNLP 2020

With the epidemic of COVID-19, verifying the scientifically false online information, such as fake news and maliciously fabricated statements, has become crucial. However, the lack of training data in the scientific domain limits the performance of fact verification models. This paper proposes an in-domain language modeling method for fact extraction and verification systems. We come up with SciKGAT to combine the advantages of open-domain literature search, state-of-the-art fact verification systems and in-domain medical knowledge through language modeling. Our experiments on SCIFACT, a dataset of expert-written scientific fact verification, show that SciKGAT achieves 30% absolute improvement on precision. Our analyses show that such improvement thrives from our in-domain language model by picking up more related evidence pieces and accurate fact verification. Our codes and data are released via Github.

pdf bib
MAVEN: A Massive General Domain Event Detection Dataset
Xiaozhi Wang | Ziqi Wang | Xu Han | Wangyi Jiang | Rong Han | Zhiyuan Liu | Juanzi Li | Peng Li | Yankai Lin | Jie Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Event detection (ED), which means identifying event trigger words and classifying event types, is the first and most fundamental step for extracting event knowledge from plain text. Most existing datasets exhibit the following issues that limit further development of ED: (1) Data scarcity. Existing small-scale datasets are not sufficient for training and stably benchmarking increasingly sophisticated modern neural methods. (2) Low coverage. Limited event types of existing datasets cannot well cover general-domain events, which restricts the applications of ED models. To alleviate these problems, we present a MAssive eVENt detection dataset (MAVEN), which contains 4,480 Wikipedia documents, 118,732 event mention instances, and 168 event types. MAVEN alleviates the data scarcity problem and covers much more general event types. We reproduce the recent state-of-the-art ED models and conduct a thorough evaluation on MAVEN. The experimental results show that existing ED methods cannot achieve promising results on MAVEN as on the small datasets, which suggests that ED in the real world remains a challenging task and requires further research efforts. We also discuss further directions for general domain ED with empirical analyses. The source code and dataset can be obtained from https://github.com/THU-KEG/MAVEN-dataset.

pdf bib
Learning from Context or Names? An Empirical Study on Neural Relation Extraction
Hao Peng | Tianyu Gao | Xu Han | Yankai Lin | Peng Li | Zhiyuan Liu | Maosong Sun | Jie Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Neural models have achieved remarkable success on relation extraction (RE) benchmarks. However, there is no clear understanding what information in text affects existing RE models to make decisions and how to further improve the performance of these models. To this end, we empirically study the effect of two main information sources in text: textual context and entity mentions (names). We find that (i) while context is the main source to support the predictions, RE models also heavily rely on the information from entity mentions, most of which is type information, and (ii) existing datasets may leak shallow heuristics via entity mentions and thus contribute to the high performance on RE benchmarks. Based on the analyses, we propose an entity-masked contrastive pre-training framework for RE to gain a deeper understanding on both textual context and type information while avoiding rote memorization of entities or use of superficial cues in mentions. We carry out extensive experiments to support our views, and show that our framework can improve the effectiveness and robustness of neural models in different RE scenarios. All the code and datasets are released at https://github.com/thunlp/RE-Context-or-Names.

pdf bib
Denoising Relation Extraction from Document-level Distant Supervision
Chaojun Xiao | Yuan Yao | Ruobing Xie | Xu Han | Zhiyuan Liu | Maosong Sun | Fen Lin | Leyu Lin
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Distant supervision (DS) has been widely adopted to generate auto-labeled data for sentence-level relation extraction (RE) and achieved great results. However, the existing success of DS cannot be directly transferred to more challenging document-level relation extraction (DocRE), as the inevitable noise caused by DS may be even multiplied in documents and significantly harm the performance of RE. To alleviate this issue, we propose a novel pre-trained model for DocRE, which de-emphasize noisy DS data via multiple pre-training tasks. The experimental results on the large-scale DocRE benchmark show that our model can capture useful information from noisy data and achieve promising results.

pdf bib
Dynamic Anticipation and Completion for Multi-Hop Reasoning over Sparse Knowledge Graph
Xin Lv | Xu Han | Lei Hou | Juanzi Li | Zhiyuan Liu | Wei Zhang | Yichi Zhang | Hao Kong | Suhui Wu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Multi-hop reasoning has been widely studied in recent years to seek an effective and interpretable method for knowledge graph (KG) completion. Most previous reasoning methods are designed for dense KGs with enough paths between entities, but cannot work well on those sparse KGs that only contain sparse paths for reasoning. On the one hand, sparse KGs contain less information, which makes it difficult for the model to choose correct paths. On the other hand, the lack of evidential paths to target entities also makes the reasoning process difficult. To solve these problems, we propose a multi-hop reasoning model over sparse KGs, by applying novel dynamic anticipation and completion strategies: (1) The anticipation strategy utilizes the latent prediction of embedding-based models to make our model perform more potential path search over sparse KGs. (2) Based on the anticipation information, the completion strategy dynamically adds edges as additional actions during the path search, which further alleviates the sparseness problem of KGs. The experimental results on five datasets sampled from Freebase, NELL and Wikidata show that our method outperforms state-of-the-art baselines. Our codes and datasets can be obtained from https://github.com/THU-KEG/DacKGR.

pdf bib
Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment
Zhiyuan Liu | Yixin Cao | Liangming Pan | Juanzi Li | Zhiyuan Liu | Tat-Seng Chua
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Entity alignment (EA) aims at building a unified Knowledge Graph (KG) of rich content by linking the equivalent entities from various KGs. GNN-based EA methods present promising performance by modeling the KG structure defined by relation triples. However, attribute triples can also provide crucial alignment signal but have not been well explored yet. In this paper, we propose to utilize an attributed value encoder and partition the KG into subgraphs to model the various types of attribute triples efficiently. Besides, the performances of current EA methods are overestimated because of the name-bias of existing EA datasets. To make an objective evaluation, we propose a hard experimental setting where we select equivalent entity pairs with very different names as the test set. Under both the regular and hard settings, our method achieves significant improvements (5.10% on average Hits@1 in DBP15k) over 12 baselines in cross-lingual and monolingual datasets. Ablation studies on different subgraphs and a case study about attribute types further demonstrate the effectiveness of our method. Source code and data can be found at https://github.com/thunlp/explore-and-evaluate.

pdf bib
Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment
Zhiyuan Liu | Yixin Cao | Liangming Pan | Juanzi Li | Zhiyuan Liu | Tat-Seng Chua
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Entity alignment (EA) aims at building a unified Knowledge Graph (KG) of rich content by linking the equivalent entities from various KGs. GNN-based EA methods present promising performance by modeling the KG structure defined by relation triples. However, attribute triples can also provide crucial alignment signal but have not been well explored yet. In this paper, we propose to utilize an attributed value encoder and partition the KG into subgraphs to model the various types of attribute triples efficiently. Besides, the performances of current EA methods are overestimated because of the name-bias of existing EA datasets. To make an objective evaluation, we propose a hard experimental setting where we select equivalent entity pairs with very different names as the test set. Under both the regular and hard settings, our method achieves significant improvements (5.10% on average Hits@1 in DBP15k) over 12 baselines in cross-lingual and monolingual datasets. Ablation studies on different subgraphs and a case study about attribute types further demonstrate the effectiveness of our method. Source code and data can be found at https://github.com/thunlp/explore-and-evaluate.

pdf bib
Train No Evil: Selective Masking for Task-Guided Pre-Training
Yuxian Gu | Zhengyan Zhang | Xiaozhi Wang | Zhiyuan Liu | Maosong Sun
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Recently, pre-trained language models mostly follow the pre-train-then-fine-tuning paradigm and have achieved great performance on various downstream tasks. However, since the pre-training stage is typically task-agnostic and the fine-tuning stage usually suffers from insufficient supervised data, the models cannot always well capture the domain-specific and task-specific patterns. In this paper, we propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning. In this stage, the model is trained by masked language modeling on in-domain unsupervised data to learn domain-specific patterns and we propose a novel selective masking strategy to learn task-specific patterns. Specifically, we design a method to measure the importance of each token in sequences and selectively mask the important tokens. Experimental results on two sentiment analysis tasks show that our method can achieve comparable or even better performance with less than 50% of computation cost, which indicates our method is both effective and efficient. The source code of this paper can be obtained from https://github.com/thunlp/SelectiveMasking.

pdf bib
Coreferential Reasoning Learning for Language Representation
Deming Ye | Yankai Lin | Jiaju Du | Zhenghao Liu | Peng Li | Maosong Sun | Zhiyuan Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Language representation models such as BERT could effectively capture contextual semantic information from plain text, and have been proved to achieve promising results in lots of downstream NLP tasks with appropriate fine-tuning. However, most existing language representation models cannot explicitly handle coreference, which is essential to the coherent understanding of the whole discourse. To address this issue, we present CorefBERT, a novel language representation model that can capture the coreferential relations in context. The experimental results show that, compared with existing baseline models, CorefBERT can achieve significant improvements consistently on various downstream NLP tasks that require coreferential reasoning, while maintaining comparable performance to previous models on other common NLP tasks. The source code and experiment details of this paper can be obtained from https://github.com/thunlp/CorefBERT.

pdf bib
Partially-Aligned Data-to-Text Generation with Distant Supervision
Zihao Fu | Bei Shi | Wai Lam | Lidong Bing | Zhiyuan Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The Data-to-Text task aims to generate human-readable text for describing some given structured data enabling more interpretability. However, the typical generation task is confined to a few particular domains since it requires well-aligned data which is difficult and expensive to obtain. Using partially-aligned data is an alternative way of solving the dataset scarcity problem. This kind of data is much easier to obtain since it can be produced automatically. However, using this kind of data induces the over-generation problem posing difficulties for existing models, which tends to add unrelated excerpts during the generation procedure. In order to effectively utilize automatically annotated partially-aligned datasets, we extend the traditional generation task to a refined task called Partially-Aligned Data-to-Text Generation (PADTG) which is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains. To tackle this new task, we propose a novel distant supervision generation framework. It firstly estimates the input data’s supportiveness for each target word with an estimator and then applies a supportiveness adaptor and a rebalanced beam search to harness the over-generation problem in the training and generation phases respectively. We also contribute a partially-aligned dataset (The data and source code of this paper can be obtained from https://github.com/fuzihaofzh/distant_supervision_nlg) by sampling sentences from Wikipedia and automatically extracting corresponding KB triples for each sentence from Wikidata. The experimental results show that our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.

pdf bib
WantWords: An Open-source Online Reverse Dictionary System
Fanchao Qi | Lei Zhang | Yanhui Yang | Zhiyuan Liu | Maosong Sun
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

A reverse dictionary takes descriptions of words as input and outputs words semantically matching the input descriptions. Reverse dictionaries have great practical value such as solving the tip-of-the-tongue problem and helping new language learners. There have been some online reverse dictionary systems, but they support English reverse dictionary queries only and their performance is far from perfect. In this paper, we present a new open-source online reverse dictionary system named WantWords (https://wantwords.thunlp.org/). It not only significantly outperforms other reverse dictionary systems on English reverse dictionary performance, but also supports Chinese and English-Chinese as well as Chinese-English cross-lingual reverse dictionary queries for the first time. Moreover, it has user-friendly front-end design which can help users find the words they need quickly and easily. All the code and data are available at https://github.com/thunlp/WantWords.

pdf bib
IsOBS: An Information System for Oracle Bone Script
Xu Han | Yuzhuo Bai | Keyue Qiu | Zhiyuan Liu | Maosong Sun
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Oracle bone script (OBS) is the earliest known ancient Chinese writing system and the ancestor of modern Chinese. As the Chinese writing system is the oldest continuously-used system in the world, the study of OBS plays an important role in both linguistic and historical research. In order to utilize advanced machine learning methods to automatically process OBS, we construct an information system for OBS (IsOBS) to symbolize, serialize, and store OBS data at the character-level, based on efficient databases and retrieval modules. Moreover, we also apply few-shot learning methods to build an effective OBS character recognition module, which can recognize a large number of OBS characters (especially those characters with a handful of examples) and make the system easy to use. The demo system of IsOBS can be found from http://isobs.thunlp.org/. In the future, we will add more OBS data to the system, and hopefully our IsOBS can support further efforts in automatically processing OBS and advance the scientific progress in this field.

pdf bib
Neural Gibbs Sampling for Joint Event Argument Extraction
Xiaozhi Wang | Shengyu Jia | Xu Han | Zhiyuan Liu | Juanzi Li | Peng Li | Jie Zhou
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Event Argument Extraction (EAE) aims at predicting event argument roles of entities in text, which is a crucial subtask and bottleneck of event extraction. Existing EAE methods either extract each event argument roles independently or sequentially, which cannot adequately model the joint probability distribution among event arguments and their roles. In this paper, we propose a Bayesian model named Neural Gibbs Sampling (NGS) to jointly extract event arguments. Specifically, we train two neural networks to model the prior distribution and conditional distribution over event arguments respectively and then use Gibbs sampling to approximate the joint distribution with the learned distributions. For overcoming the shortcoming of the high complexity of the original Gibbs sampling algorithm, we further apply simulated annealing to efficiently estimate the joint probability distribution over event arguments and make predictions. We conduct experiments on the two widely-used benchmark datasets ACE 2005 and TAC KBP 2016. The Experimental results show that our NGS model can achieve comparable results to existing state-of-the-art EAE methods. The source code can be obtained from https://github.com/THU-KEG/NGS.

pdf bib
More Data, More Relations, More Context and More Openness: A Review and Outlook for Relation Extraction
Xu Han | Tianyu Gao | Yankai Lin | Hao Peng | Yaoliang Yang | Chaojun Xiao | Zhiyuan Liu | Peng Li | Jie Zhou | Maosong Sun
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Relational facts are an important component of human knowledge, which are hidden in vast amounts of text. In order to extract these facts from text, people have been working on relation extraction (RE) for years. From early pattern matching to current neural networks, existing RE methods have achieved significant progress. Yet with explosion of Web text and emergence of new relations, human knowledge is increasing drastically, and we thus require “more” from RE: a more powerful RE system that can robustly utilize more data, efficiently learn more relations, easily handle more complicated context, and flexibly generalize to more open domains. In this paper, we look back at existing RE methods, analyze key challenges we are facing nowadays, and show promising directions towards more powerful RE. We hope our view can advance this field and inspire more efforts in the community.

pdf bib
ExpanRL: Hierarchical Reinforcement Learning for Course Concept Expansion in MOOCs
Jifan Yu | Chenyu Wang | Gan Luo | Lei Hou | Juanzi Li | Jie Tang | Minlie Huang | Zhiyuan Liu
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Within the prosperity of Massive Open Online Courses (MOOCs), the education applications that automatically provide extracurricular knowledge for MOOC users become rising research topics. However, MOOC courses’ diversity and rapid updates make it more challenging to find suitable new knowledge for students. In this paper, we present ExpanRL, an end-to-end hierarchical reinforcement learning (HRL) model for concept expansion in MOOCs. Employing a two-level HRL mechanism of seed selection and concept expansion, ExpanRL is more feasible to adjust the expansion strategy to find new concepts based on the students’ feedback on expansion results. Our experiments on nine novel datasets from real MOOCs show that ExpanRL achieves significant improvements over existing methods and maintain competitive performance under different settings.

2019

pdf bib
Open Relation Extraction: Relational Knowledge Transfer from Supervised Data to Unsupervised Data
Ruidong Wu | Yuan Yao | Xu Han | Ruobing Xie | Zhiyuan Liu | Fen Lin | Leyu Lin | Maosong Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Open relation extraction (OpenRE) aims to extract relational facts from the open-domain corpus. To this end, it discovers relation patterns between named entities and then clusters those semantically equivalent patterns into a united relation cluster. Most OpenRE methods typically confine themselves to unsupervised paradigms, without taking advantage of existing relational facts in knowledge bases (KBs) and their high-quality labeled instances. To address this issue, we propose Relational Siamese Networks (RSNs) to learn similarity metrics of relations from labeled data of pre-defined relations, and then transfer the relational knowledge to identify novel relations in unlabeled data. Experiment results on two real-world datasets show that our framework can achieve significant improvements as compared with other state-of-the-art methods. Our code is available at https://github.com/thunlp/RSN.

pdf bib
Low-Resource Name Tagging Learned with Weakly Labeled Data
Yixin Cao | Zikun Hu | Tat-seng Chua | Zhiyuan Liu | Heng Ji
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Name tagging in low-resource languages or domains suffers from inadequate training data. Existing work heavily relies on additional information, while leaving those noisy annotations unexplored that extensively exist on the web. In this paper, we propose a novel neural model for name tagging solely based on weakly labeled (WL) data, so that it can be applied in any low-resource settings. To take the best advantage of all WL sentences, we split them into high-quality and noisy portions for two modules, respectively: (1) a classification module focusing on the large portion of noisy data can efficiently and robustly pretrain the tag classifier by capturing textual context semantics; and (2) a costly sequence labeling module focusing on high-quality data utilizes Partial-CRFs with non-entity sampling to achieve global optimum. Two modules are combined via shared parameters. Extensive experiments involving five low-resource languages and fine-grained food domain demonstrate our superior performance (6% and 7.8% F1 gains on average) as well as efficiency.

pdf bib
Event Detection with Trigger-Aware Lattice Neural Network
Ning Ding | Ziran Li | Zhiyuan Liu | Haitao Zheng | Zibo Lin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Event detection (ED) aims to locate trigger words in raw text and then classify them into correct event types. In this task, neural net- work based models became mainstream in re- cent years. However, two problems arise when it comes to languages without natural delim- iters, such as Chinese. First, word-based mod- els severely suffer from the problem of word- trigger mismatch, limiting the performance of the methods. In addition, even if trigger words could be accurately located, the ambi- guity of polysemy of triggers could still af- fect the trigger classification stage. To ad- dress the two issues simultaneously, we pro- pose the Trigger-aware Lattice Neural Net- work (TLNN). (1) The framework dynami- cally incorporates word and character informa- tion so that the trigger-word mismatch issue can be avoided. (2) Moreover, for polysemous characters and words, we model all senses of them with the help of an external linguistic knowledge base, so as to alleviate the prob- lem of ambiguous triggers. Experiments on two benchmark datasets show that our model could effectively tackle the two issues and outperforms previous state-of-the-art methods significantly, giving the best results. The source code of this paper can be obtained from https://github.com/thunlp/TLNN.

pdf bib
NumNet: Machine Reading Comprehension with Numerical Reasoning
Qiu Ran | Yankai Lin | Peng Li | Jie Zhou | Zhiyuan Liu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Numerical reasoning, such as addition, subtraction, sorting and counting is a critical skill in human’s reading comprehension, which has not been well considered in existing machine reading comprehension (MRC) systems. To address this issue, we propose a numerical MRC model named as NumNet, which utilizes a numerically-aware graph neural network to consider the comparing information and performs numerical reasoning over numbers in the question and passage. Our system achieves an EM-score of 64.56% on the DROP dataset, outperforming all existing machine reading comprehension models by considering the numerical relations among numbers.

pdf bib
Adapting Meta Knowledge Graph Information for Multi-Hop Reasoning over Few-Shot Relations
Xin Lv | Yuxian Gu | Xu Han | Lei Hou | Juanzi Li | Zhiyuan Liu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Multi-hop knowledge graph (KG) reasoning is an effective and explainable method for predicting the target entity via reasoning paths in query answering (QA) task. Most previous methods assume that every relation in KGs has enough triples for training, regardless of those few-shot relations which cannot provide sufficient triples for training robust reasoning models. In fact, the performance of existing multi-hop reasoning methods drops significantly on few-shot relations. In this paper, we propose a meta-based multi-hop reasoning method (Meta-KGR), which adopts meta-learning to learn effective meta parameters from high-frequency relations that could quickly adapt to few-shot relations. We evaluate Meta-KGR on two public datasets sampled from Freebase and NELL, and the experimental results show that Meta-KGR outperforms state-of-the-art methods in few-shot scenarios. In the future, our codes and datasets will also be available to provide more details.

pdf bib
HMEAE: Hierarchical Modular Event Argument Extraction
Xiaozhi Wang | Ziqi Wang | Xu Han | Zhiyuan Liu | Juanzi Li | Peng Li | Maosong Sun | Jie Zhou | Xiang Ren
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Existing event extraction methods classify each argument role independently, ignoring the conceptual correlations between different argument roles. In this paper, we propose a Hierarchical Modular Event Argument Extraction (HMEAE) model, to provide effective inductive bias from the concept hierarchy of event argument roles. Specifically, we design a neural module network for each basic unit of the concept hierarchy, and then hierarchically compose relevant unit modules with logical operations into a role-oriented modular network to classify a specific argument role. As many argument roles share the same high-level unit module, their correlation can be utilized to extract specific event arguments better. Experiments on real-world datasets show that HMEAE can effectively leverage useful knowledge from the concept hierarchy and significantly outperform the state-of-the-art baselines. The source code can be obtained from https://github.com/thunlp/HMEAE.

pdf bib
FewRel 2.0: Towards More Challenging Few-Shot Relation Classification
Tianyu Gao | Xu Han | Hao Zhu | Zhiyuan Liu | Peng Li | Maosong Sun | Jie Zhou
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We present FewRel 2.0, a more challenging task to investigate two aspects of few-shot relation classification models: (1) Can they adapt to a new domain with only a handful of instances? (2) Can they detect none-of-the-above (NOTA) relations? To construct FewRel 2.0, we build upon the FewRel dataset by adding a new test set in a quite different domain, and a NOTA relation choice. With the new dataset and extensive experimental analysis, we found (1) that the state-of-the-art few-shot relation classification models struggle on these two aspects, and (2) that the commonly-used techniques for domain adaptation and NOTA detection still cannot handle the two challenges well. Our research calls for more attention and further efforts to these two real-world issues. All details and resources about the dataset and baselines are released at https://github.com/thunlp/fewrel.

pdf bib
OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction
Xu Han | Tianyu Gao | Yuan Yao | Deming Ye | Zhiyuan Liu | Maosong Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

OpenNRE is an open-source and extensible toolkit that provides a unified framework to implement neural models for relation extraction (RE). Specifically, by implementing typical RE methods, OpenNRE not only allows developers to train custom models to extract structured relational facts from the plain text but also supports quick model validation for researchers. Besides, OpenNRE provides various functional RE modules based on both TensorFlow and PyTorch to maintain sufficient modularity and extensibility, making it becomes easy to incorporate new models into the framework. Besides the toolkit, we also release an online system to meet real-time extraction without any training and deploying. Meanwhile, the online system can extract facts in various scenarios as well as aligning the extracted facts to Wikidata, which may benefit various downstream knowledge-driven applications (e.g., information retrieval and question answering). More details of the toolkit and online system can be obtained from http://github.com/thunlp/OpenNRE.

pdf bib
Adversarial Training for Weakly Supervised Event Detection
Xiaozhi Wang | Xu Han | Zhiyuan Liu | Maosong Sun | Peng Li
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Modern weakly supervised methods for event detection (ED) avoid time-consuming human annotation and achieve promising results by learning from auto-labeled data. However, these methods typically rely on sophisticated pre-defined rules as well as existing instances in knowledge bases for automatic annotation and thus suffer from low coverage, topic bias, and data noise. To address these issues, we build a large event-related candidate set with good coverage and then apply an adversarial training mechanism to iteratively identify those informative instances from the candidate set and filter out those noisy ones. The experiments on two real-world datasets show that our candidate selection and adversarial training can cooperate together to obtain more diverse and accurate training data for ED, and significantly outperform the state-of-the-art methods in various weakly supervised scenarios. The datasets and source code can be obtained from https://github.com/thunlp/Adv-ED.

pdf bib
Fact Discovery from Knowledge Base via Facet Decomposition
Zihao Fu | Yankai Lin | Zhiyuan Liu | Wai Lam
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

During the past few decades, knowledge bases (KBs) have experienced rapid growth. Nevertheless, most KBs still suffer from serious incompletion. Researchers proposed many tasks such as knowledge base completion and relation prediction to help build the representation of KBs. However, there are some issues unsettled towards enriching the KBs. Knowledge base completion and relation prediction assume that we know two elements of the fact triples and we are going to predict the missing one. This assumption is too restricted in practice and prevents it from discovering new facts directly. To address this issue, we propose a new task, namely, fact discovery from knowledge base. This task only requires that we know the head entity and the goal is to discover facts associated with the head entity. To tackle this new problem, we propose a novel framework that decomposes the discovery problem into several facet discovery components. We also propose a novel auto-encoder based facet component to estimate some facets of the fact. Besides, we propose a feedback learning component to share the information between each facet. We evaluate our framework using a benchmark dataset and the experimental results show that our framework achieves promising results. We also conduct an extensive analysis of our framework in discovering different kinds of facts. The source code of this paper can be obtained from https://github.com/thunlp/FFD.

pdf bib
DocRED: A Large-Scale Document-Level Relation Extraction Dataset
Yuan Yao | Deming Ye | Peng Li | Xu Han | Yankai Lin | Zhenghao Liu | Zhiyuan Liu | Lixin Huang | Jie Zhou | Maosong Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Multiple entities in a document generally exhibit complex inter-sentence relations, and cannot be well handled by existing relation extraction (RE) methods that typically focus on extracting intra-sentence relations for single entity pairs. In order to accelerate the research on document-level RE, we introduce DocRED, a new dataset constructed from Wikipedia and Wikidata with three features: (1) DocRED annotates both named entities and relations, and is the largest human-annotated dataset for document-level RE from plain text; (2) DocRED requires reading multiple sentences in a document to extract entities and infer their relations by synthesizing all information of the document; (3) along with the human-annotated data, we also offer large-scale distantly supervised data, which enables DocRED to be adopted for both supervised and weakly supervised scenarios. In order to verify the challenges of document-level RE, we implement recent state-of-the-art methods for RE and conduct a thorough evaluation of these methods on DocRED. Empirical results show that DocRED is challenging for existing RE methods, which indicates that document-level RE remains an open problem and requires further efforts. Based on the detailed analysis on the experiments, we discuss multiple promising directions for future research. We make DocRED and the code for our baselines publicly available at https://github.com/thunlp/DocRED.

pdf bib
GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification
Jie Zhou | Xu Han | Cheng Yang | Zhiyuan Liu | Lifeng Wang | Changcheng Li | Maosong Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Fact verification (FV) is a challenging task which requires to retrieve relevant evidence from plain text and use the evidence to verify given claims. Many claims require to simultaneously integrate and reason over several pieces of evidence for verification. However, previous work employs simple models to extract information from evidence without letting evidence communicate with each other, e.g., merely concatenate the evidence for processing. Therefore, these methods are unable to grasp sufficient relational and logical information among the evidence. To alleviate this issue, we propose a graph-based evidence aggregating and reasoning (GEAR) framework which enables information to transfer on a fully-connected evidence graph and then utilizes different aggregators to collect multi-evidence information. We further employ BERT, an effective pre-trained language representation model, to improve the performance. Experimental results on a large-scale benchmark dataset FEVER have demonstrated that GEAR could leverage multi-evidence information for FV and thus achieves the promising result with a test FEVER score of 67.10%. Our code is available at https://github.com/thunlp/GEAR.

pdf bib
Graph Neural Networks with Generated Parameters for Relation Extraction
Hao Zhu | Yankai Lin | Zhiyuan Liu | Jie Fu | Tat-Seng Chua | Maosong Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we propose a novel graph neural network with generated parameters (GP-GNNs). The parameters in the propagation module, i.e. the transition matrices used in message passing procedure, are produced by a generator taking natural language sentences as inputs. We verify GP-GNNs in relation extraction from text, both on bag- and instance-settings. Experimental results on a human-annotated dataset and two distantly supervised datasets show that multi-hop reasoning mechanism yields significant improvements. We also perform a qualitative analysis to demonstrate that our model could discover more accurate relations by multi-hop relational reasoning.

pdf bib
DIAG-NRE: A Neural Pattern Diagnosis Framework for Distantly Supervised Neural Relation Extraction
Shun Zheng | Xu Han | Yankai Lin | Peilin Yu | Lu Chen | Ling Huang | Zhiyuan Liu | Wei Xu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Pattern-based labeling methods have achieved promising results in alleviating the inevitable labeling noises of distantly supervised neural relation extraction. However, these methods require significant expert labor to write relation-specific patterns, which makes them too sophisticated to generalize quickly. To ease the labor-intensive workload of pattern writing and enable the quick generalization to new relation types, we propose a neural pattern diagnosis framework, DIAG-NRE, that can automatically summarize and refine high-quality relational patterns from noise data with human experts in the loop. To demonstrate the effectiveness of DIAG-NRE, we apply it to two real-world datasets and present both significant and interpretable improvements over state-of-the-art methods.

pdf bib
ERNIE: Enhanced Language Representation with Informative Entities
Zhengyan Zhang | Xu Han | Zhiyuan Liu | Xin Jiang | Maosong Sun | Qun Liu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The code and datasets will be available in the future.

pdf bib
Multi-Channel Graph Neural Network for Entity Alignment
Yixin Cao | Zhiyuan Liu | Chengjiang Li | Zhiyuan Liu | Juanzi Li | Tat-Seng Chua
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Entity alignment typically suffers from the issues of structural heterogeneity and limited seed alignments. In this paper, we propose a novel Multi-channel Graph Neural Network model (MuGNN) to learn alignment-oriented knowledge graph (KG) embeddings by robustly encoding two KGs via multiple channels. Each channel encodes KGs via different relation weighting schemes with respect to self-attention towards KG completion and cross-KG attention for pruning exclusive entities respectively, which are further combined via pooling techniques. Moreover, we also infer and transfer rule knowledge for completing two KGs consistently. MuGNN is expected to reconcile the structural differences of two KGs, and thus make better use of seed alignments. Extensive experiments on five publicly available datasets demonstrate our superior performance (5% Hits@1 up on average). Source code and data used in the experiments can be accessed at https://github.com/thunlp/MuGNN .

pdf bib
Multi-Channel Graph Neural Network for Entity Alignment
Yixin Cao | Zhiyuan Liu | Chengjiang Li | Zhiyuan Liu | Juanzi Li | Tat-Seng Chua
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Entity alignment typically suffers from the issues of structural heterogeneity and limited seed alignments. In this paper, we propose a novel Multi-channel Graph Neural Network model (MuGNN) to learn alignment-oriented knowledge graph (KG) embeddings by robustly encoding two KGs via multiple channels. Each channel encodes KGs via different relation weighting schemes with respect to self-attention towards KG completion and cross-KG attention for pruning exclusive entities respectively, which are further combined via pooling techniques. Moreover, we also infer and transfer rule knowledge for completing two KGs consistently. MuGNN is expected to reconcile the structural differences of two KGs, and thus make better use of seed alignments. Extensive experiments on five publicly available datasets demonstrate our superior performance (5% Hits@1 up on average). Source code and data used in the experiments can be accessed at https://github.com/thunlp/MuGNN .

pdf bib
XQA: A Cross-lingual Open-domain Question Answering Dataset
Jiahua Liu | Yankai Lin | Zhiyuan Liu | Maosong Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Open-domain question answering (OpenQA) aims to answer questions through text retrieval and reading comprehension. Recently, lots of neural network-based models have been proposed and achieved promising results in OpenQA. However, the success of these models relies on a massive volume of training data (usually in English), which is not available in many other languages, especially for those low-resource languages. Therefore, it is essential to investigate cross-lingual OpenQA. In this paper, we construct a novel dataset XQA for cross-lingual OpenQA research. It consists of a training set in English as well as development and test sets in eight other languages. Besides, we provide several baseline systems for cross-lingual OpenQA, including two machine translation-based methods and one zero-shot cross-lingual method (multilingual BERT). Experimental results show that the multilingual BERT model achieves the best results in almost all target languages, while the performance of cross-lingual OpenQA is still much lower than that of English. Our analysis indicates that the performance of cross-lingual OpenQA is related to not only how similar the target language and English are, but also how difficult the question set of the target language is. The XQA dataset is publicly available at http://github.com/thunlp/XQA.

pdf bib
Quantifying Similarity between Relations with Fact Distribution
Weize Chen | Hao Zhu | Xu Han | Zhiyuan Liu | Maosong Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We introduce a conceptually simple and effective method to quantify the similarity between relations in knowledge bases. Specifically, our approach is based on the divergence between the conditional probability distributions over entity pairs. In this paper, these distributions are parameterized by a very simple neural network. Although computing the exact similarity is in-tractable, we provide a sampling-based method to get a good approximation. We empirically show the outputs of our approach significantly correlate with human judgments. By applying our method to various tasks, we also find that (1) our approach could effectively detect redundant relations extracted by open information extraction (Open IE) models, that (2) even the most competitive models for relational classification still make mistakes among very similar relations, and that (3) our approach could be incorporated into negative sampling and softmax classification to alleviate these mistakes.

pdf bib
Course Concept Expansion in MOOCs with External Knowledge and Interactive Game
Jifan Yu | Chenyu Wang | Gan Luo | Lei Hou | Juanzi Li | Zhiyuan Liu | Jie Tang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

As Massive Open Online Courses (MOOCs) become increasingly popular, it is promising to automatically provide extracurricular knowledge for MOOC users. Suffering from semantic drifts and lack of knowledge guidance, existing methods can not effectively expand course concepts in complex MOOC environments. In this paper, we first build a novel boundary during searching for new concepts via external knowledge base and then utilize heterogeneous features to verify the high-quality results. In addition, to involve human efforts in our model, we design an interactive optimization mechanism based on a game. Our experiments on the four datasets from Coursera and XuetangX show that the proposed method achieves significant improvements(+0.19 by MAP) over existing methods.

pdf bib
Chinese Relation Extraction with Multi-Grained Information and External Linguistic Knowledge
Ziran Li | Ning Ding | Zhiyuan Liu | Haitao Zheng | Ying Shen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Chinese relation extraction is conducted using neural networks with either character-based or word-based inputs, and most existing methods typically suffer from segmentation errors and ambiguity of polysemy. To address the issues, we propose a multi-grained lattice framework (MG lattice) for Chinese relation extraction to take advantage of multi-grained language information and external linguistic knowledge. In this framework, (1) we incorporate word-level information into character sequence inputs so that segmentation errors can be avoided. (2) We also model multiple senses of polysemous words with the help of external linguistic knowledge, so as to alleviate polysemy ambiguity. Experiments on three real-world datasets in distinct domains show consistent and significant superiority and robustness of our model, as compared with other baselines. We will release the source code of this paper in the future.

pdf bib
Modeling Semantic Compositionality with Sememe Knowledge
Fanchao Qi | Junjie Huang | Chenghao Yang | Zhiyuan Liu | Xiao Chen | Qun Liu | Maosong Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Semantic compositionality (SC) refers to the phenomenon that the meaning of a complex linguistic unit can be composed of the meanings of its constituents. Most related works focus on using complicated compositionality functions to model SC while few works consider external knowledge in models. In this paper, we verify the effectiveness of sememes, the minimum semantic units of human languages, in modeling SC by a confirmatory experiment. Furthermore, we make the first attempt to incorporate sememe knowledge into SC models, and employ the sememe-incorporated models in learning representations of multiword expressions, a typical task of SC. In experiments, we implement our models by incorporating knowledge from a famous sememe knowledge base HowNet and perform both intrinsic and extrinsic evaluations. Experimental results show that our models achieve significant performance boost as compared to the baseline methods without considering sememe knowledge. We further conduct quantitative analysis and case studies to demonstrate the effectiveness of applying sememe knowledge in modeling SC.All the code and data of this paper can be obtained on https://github.com/thunlp/Sememe-SC.

2018

pdf bib
Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision
Yixin Cao | Lei Hou | Juanzi Li | Zhiyuan Liu | Chengjiang Li | Xu Chen | Tiansi Dong
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Jointly representation learning of words and entities benefits many NLP tasks, but has not been well explored in cross-lingual settings. In this paper, we propose a novel method for joint representation learning of cross-lingual words and entities. It captures mutually complementary knowledge, and enables cross-lingual inferences among knowledge bases and texts. Our method does not require parallel corpus, and automatically generates comparable data via distant supervision using multi-lingual knowledge bases. We utilize two types of regularizers to align cross-lingual words and entities, and design knowledge attention and cross-lingual attention to further reduce noises. We conducted a series of experiments on three tasks: word translation, entity relatedness, and cross-lingual entity linking. The results, both qualitative and quantitative, demonstrate the significance of our method.

pdf bib
Cross-lingual Lexical Sememe Prediction
Fanchao Qi | Yankai Lin | Maosong Sun | Hao Zhu | Ruobing Xie | Zhiyuan Liu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Sememes are defined as the minimum semantic units of human languages. As important knowledge sources, sememe-based linguistic knowledge bases have been widely used in many NLP tasks. However, most languages still do not have sememe-based linguistic knowledge bases. Thus we present a task of cross-lingual lexical sememe prediction, aiming to automatically predict sememes for words in other languages. We propose a novel framework to model correlations between sememes and multi-lingual words in low-dimensional semantic space for sememe prediction. Experimental results on real-world datasets show that our proposed model achieves consistent and significant improvements as compared to baseline methods in cross-lingual sememe prediction. The codes and data of this paper are available at https://github.com/thunlp/CL-SP.

pdf bib
Put It Back: Entity Typing with Language Model Enhancement
Ji Xin | Hao Zhu | Xu Han | Zhiyuan Liu | Maosong Sun
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Entity typing aims to classify semantic types of an entity mention in a specific context. Most existing models obtain training data using distant supervision, and inevitably suffer from the problem of noisy labels. To address this issue, we propose entity typing with language model enhancement. It utilizes a language model to measure the compatibility between context sentences and labels, and thereby automatically focuses more on context-dependent labels. Experiments on benchmark datasets demonstrate that our method is capable of enhancing the entity typing model with information from the language model, and significantly outperforms the state-of-the-art baseline. Code and data for this paper can be found from https://github.com/thunlp/LME.

pdf bib
Differentiating Concepts and Instances for Knowledge Graph Embedding
Xin Lv | Lei Hou | Juanzi Li | Zhiyuan Liu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Concepts, which represent a group of different instances sharing common properties, are essential information in knowledge representation. Most conventional knowledge embedding methods encode both entities (concepts and instances) and relations as vectors in a low dimensional semantic space equally, ignoring the difference between concepts and instances. In this paper, we propose a novel knowledge graph embedding model named TransC by differentiating concepts and instances. Specifically, TransC encodes each concept in knowledge graph as a sphere and each instance as a vector in the same semantic space. We use the relative positions to model the relations between concepts and instances (i.e.,instanceOf), and the relations between concepts and sub-concepts (i.e., subClassOf). We evaluate our model on both link prediction and triple classification tasks on the dataset based on YAGO. Experimental results show that TransC outperforms state-of-the-art methods, and captures the semantic transitivity for instanceOf and subClassOf relation. Our codes and datasets can be obtained from https://github.com/davidlvxin/TransC.

pdf bib
Hierarchical Relation Extraction with Coarse-to-Fine Grained Attention
Xu Han | Pengfei Yu | Zhiyuan Liu | Maosong Sun | Peng Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Distantly supervised relation extraction employs existing knowledge graphs to automatically collect training data. While distant supervision is effective to scale relation extraction up to large-scale corpora, it inevitably suffers from the wrong labeling problem. Many efforts have been devoted to identifying valid instances from noisy data. However, most existing methods handle each relation in isolation, regardless of rich semantic correlations located in relation hierarchies. In this paper, we aim to incorporate the hierarchical information of relations for distantly supervised relation extraction and propose a novel hierarchical attention scheme. The multiple layers of our hierarchical attention scheme provide coarse-to-fine granularity to better identify valid instances, which is especially effective for extracting those long-tail relations. The experimental results on a large-scale benchmark dataset demonstrate that our models are capable of modeling the hierarchical information of relations and significantly outperform other baselines. The source code of this paper can be obtained from https://github.com/thunlp/HNRE.

pdf bib
Legal Judgment Prediction via Topological Learning
Haoxi Zhong | Zhipeng Guo | Cunchao Tu | Chaojun Xiao | Zhiyuan Liu | Maosong Sun
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Legal Judgment Prediction (LJP) aims to predict the judgment result based on the facts of a case and becomes a promising application of artificial intelligence techniques in the legal field. In real-world scenarios, legal judgment usually consists of multiple subtasks, such as the decisions of applicable law articles, charges, fines, and the term of penalty. Moreover, there exist topological dependencies among these subtasks. While most existing works only focus on a specific subtask of judgment prediction and ignore the dependencies among subtasks, we formalize the dependencies among subtasks as a Directed Acyclic Graph (DAG) and propose a topological multi-task learning framework, TopJudge, which incorporates multiple subtasks and DAG dependencies into judgment prediction. We conduct experiments on several real-world large-scale datasets of criminal cases in the civil law system. Experimental results show that our model achieves consistent and significant improvements over baselines on all judgment prediction tasks. The source code can be obtained from https://github.com/thunlp/TopJudge.

pdf bib
Language Modeling with Sparse Product of Sememe Experts
Yihong Gu | Jun Yan | Hao Zhu | Zhiyuan Liu | Ruobing Xie | Maosong Sun | Fen Lin | Leyu Lin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Most language modeling methods rely on large-scale data to statistically learn the sequential patterns of words. In this paper, we argue that words are atomic language units but not necessarily atomic semantic units. Inspired by HowNet, we use sememes, the minimum semantic units in human languages, to represent the implicit semantics behind words for language modeling, named Sememe-Driven Language Model (SDLM). More specifically, to predict the next word, SDLM first estimates the sememe distribution given textual context. Afterwards, it regards each sememe as a distinct semantic expert, and these experts jointly identify the most probable senses and the corresponding word. In this way, SDLM enables language models to work beyond word-level manipulation to fine-grained sememe-level semantics, and offers us more powerful tools to fine-tune language models and improve the interpretability as well as the robustness of language models. Experiments on language modeling and the downstream application of headline generation demonstrate the significant effectiveness of SDLM.

pdf bib
FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation
Xu Han | Hao Zhu | Pengfei Yu | Ziyun Wang | Yuan Yao | Zhiyuan Liu | Maosong Sun
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present a Few-Shot Relation Classification Dataset (dataset), consisting of 70, 000 sentences on 100 relations derived from Wikipedia and annotated by crowdworkers. The relation of each sentence is first recognized by distant supervision methods, and then filtered by crowdworkers. We adapt the most recent state-of-the-art few-shot learning methods for relation classification and conduct thorough evaluation of these methods. Empirical results show that even the most competitive few-shot learning models struggle on this task, especially as compared with humans. We also show that a range of different reasoning skills are needed to solve our task. These results indicate that few-shot relation classification remains an open problem and still requires further research. Our detailed analysis points multiple directions for future research.

pdf bib
OpenKE: An Open Toolkit for Knowledge Embedding
Xu Han | Shulin Cao | Xin Lv | Yankai Lin | Zhiyuan Liu | Maosong Sun | Juanzi Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We release an open toolkit for knowledge embedding (OpenKE), which provides a unified framework and various fundamental models to embed knowledge graphs into a continuous low-dimensional space. OpenKE prioritizes operational efficiency to support quick model validation and large-scale knowledge representation learning. Meanwhile, OpenKE maintains sufficient modularity and extensibility to easily incorporate new models into the framework. Besides the toolkit, the embeddings of some existing large-scale knowledge graphs pre-trained by OpenKE are also available, which can be directly applied for many applications including information retrieval, personalized recommendation and question answering. The toolkit, documentation, and pre-trained embeddings are all released on http://openke.thunlp.org/.

pdf bib
Denoising Distantly Supervised Open-Domain Question Answering
Yankai Lin | Haozhe Ji | Zhiyuan Liu | Maosong Sun
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Distantly supervised open-domain question answering (DS-QA) aims to find answers in collections of unlabeled text. Existing DS-QA models usually retrieve related paragraphs from a large-scale corpus and apply reading comprehension technique to extract answers from the most relevant paragraph. They ignore the rich information contained in other paragraphs. Moreover, distant supervision data inevitably accompanies with the wrong labeling problem, and these noisy data will substantially degrade the performance of DS-QA. To address these issues, we propose a novel DS-QA model which employs a paragraph selector to filter out those noisy paragraphs and a paragraph reader to extract the correct answer from those denoised paragraphs. Experimental results on real-world datasets show that our model can capture useful information from noisy data and achieve significant improvements on DS-QA as compared to all baselines.

pdf bib
Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval
Zhenghao Liu | Chenyan Xiong | Maosong Sun | Zhiyuan Liu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper presents the Entity-Duet Neural Ranking Model (EDRM), which introduces knowledge graphs to neural search systems. EDRM represents queries and documents by their words and entity annotations. The semantics from knowledge graphs are integrated in the distributed representations of their entities, while the ranking is conducted by interaction-based neural ranking networks. The two components are learned end-to-end, making EDRM a natural combination of entity-oriented search and neural information retrieval. Our experiments on a commercial search log demonstrate the effectiveness of EDRM. Our analyses reveal that knowledge graph semantics significantly improve the generalization ability of neural ranking models.

pdf bib
Incorporating Chinese Characters of Words for Lexical Sememe Prediction
Huiming Jin | Hao Zhu | Zhiyuan Liu | Ruobing Xie | Maosong Sun | Fen Lin | Leyu Lin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sememes are minimum semantic units of concepts in human languages, such that each word sense is composed of one or multiple sememes. Words are usually manually annotated with their sememes by linguists, and form linguistic common-sense knowledge bases widely used in various NLP tasks. Recently, the lexical sememe prediction task has been introduced. It consists of automatically recommending sememes for words, which is expected to improve annotation efficiency and consistency. However, existing methods of lexical sememe prediction typically rely on the external context of words to represent the meaning, which usually fails to deal with low-frequency and out-of-vocabulary words. To address this issue for Chinese, we propose a novel framework to take advantage of both internal character information and external context information of words. We experiment on HowNet, a Chinese sememe knowledge base, and demonstrate that our framework outperforms state-of-the-art baselines by a large margin, and maintains a robust performance even for low-frequency words.

pdf bib
Few-Shot Charge Prediction with Discriminative Legal Attributes
Zikun Hu | Xiang Li | Cunchao Tu | Zhiyuan Liu | Maosong Sun
Proceedings of the 27th International Conference on Computational Linguistics

Automatic charge prediction aims to predict the final charges according to the fact descriptions in criminal cases and plays a crucial role in legal assistant systems. Existing works on charge prediction perform adequately on those high-frequency charges but are not yet capable of predicting few-shot charges with limited cases. Moreover, these exist many confusing charge pairs, whose fact descriptions are fairly similar to each other. To address these issues, we introduce several discriminative attributes of charges as the internal mapping between fact descriptions and charges. These attributes provide additional information for few-shot charges, as well as effective signals for distinguishing confusing charges. More specifically, we propose an attribute-attentive charge prediction model to infer the attributes and charges simultaneously. Experimental results on real-work datasets demonstrate that our proposed model achieves significant and consistent improvements than other state-of-the-art baselines. Specifically, our model outperforms other baselines by more than 50% in the few-shot scenario. Our codes and datasets can be obtained from https://github.com/thunlp/attribute_charge.

pdf bib
Neural Collective Entity Linking
Yixin Cao | Lei Hou | Juanzi Li | Zhiyuan Liu
Proceedings of the 27th International Conference on Computational Linguistics

Entity Linking aims to link entity mentions in texts to knowledge bases, and neural models have achieved recent success in this task. However, most existing methods rely on local contexts to resolve entities independently, which may usually fail due to the data sparsity of local information. To address this issue, we propose a novel neural model for collective entity linking, named as NCEL. NCEL apply Graph Convolutional Network to integrate both local contextual features and global coherence information for entity linking. To improve the computation efficiency, we approximately perform graph convolution on a subgraph of adjacent entity mentions instead of those in the entire text. We further introduce an attention scheme to improve the robustness of NCEL to data noise and train the model on Wikipedia hyperlinks to avoid overfitting and domain bias. In experiments, we evaluate NCEL on five publicly available datasets to verify the linking performance as well as generalization ability. We also conduct an extensive analysis of time complexity, the impact of key modules, and qualitative results, which demonstrate the effectiveness and efficiency of our proposed method.

pdf bib
Adversarial Multi-lingual Neural Relation Extraction
Xiaozhi Wang | Xu Han | Yankai Lin | Zhiyuan Liu | Maosong Sun
Proceedings of the 27th International Conference on Computational Linguistics

Multi-lingual relation extraction aims to find unknown relational facts from text in various languages. Existing models cannot well capture the consistency and diversity of relation patterns in different languages. To address these issues, we propose an adversarial multi-lingual neural relation extraction (AMNRE) model, which builds both consistent and individual representations for each sentence to consider the consistency and diversity among languages. Further, we adopt an adversarial training strategy to ensure those consistent sentence representations could effectively extract the language-consistent relation patterns. The experimental results on real-world datasets demonstrate that our AMNRE model significantly outperforms the state-of-the-art models. The source code of this paper can be obtained from https://github.com/thunlp/AMNRE.

2017

pdf bib
On Modeling Sense Relatedness in Multi-prototype Word Embedding
Yixin Cao | Jiaxin Shi | Juanzi Li | Zhiyuan Liu | Chengjiang Li
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

To enhance the expression ability of distributional word representation learning model, many researchers tend to induce word senses through clustering, and learn multiple embedding vectors for each word, namely multi-prototype word embedding model. However, most related work ignores the relatedness among word senses which actually plays an important role. In this paper, we propose a novel approach to capture word sense relatedness in multi-prototype word embedding model. Particularly, we differentiate the original sense and extended senses of a word by introducing their global occurrence information and model their relatedness through the local textual context information. Based on the idea of fuzzy clustering, we introduce a random process to integrate these two types of senses and design two non-parametric methods for word sense induction. To make our model more scalable and efficient, we use an online joint learning framework extended from the Skip-gram model. The experimental results demonstrate that our model outperforms both conventional single-prototype embedding models and other multi-prototype embedding models, and achieves more stable performance when trained on smaller data.

pdf bib
Neural Relation Extraction with Multi-lingual Attention
Yankai Lin | Zhiyuan Liu | Maosong Sun
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Relation extraction has been widely used for finding unknown relational facts from plain text. Most existing methods focus on exploiting mono-lingual data for relation extraction, ignoring massive information from the texts in various languages. To address this issue, we introduce a multi-lingual neural relation extraction framework, which employs mono-lingual attention to utilize the information within mono-lingual texts and further proposes cross-lingual attention to consider the information consistency and complementarity among cross-lingual texts. Experimental results on real-world datasets show that, our model can take advantage of multi-lingual texts and consistently achieve significant improvements on relation extraction as compared with baselines.

pdf bib
CANE: Context-Aware Network Embedding for Relation Modeling
Cunchao Tu | Han Liu | Zhiyuan Liu | Maosong Sun
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Network embedding (NE) is playing a critical role in network analysis, due to its ability to represent vertices with efficient low-dimensional embedding vectors. However, existing NE models aim to learn a fixed context-free embedding for each vertex and neglect the diverse roles when interacting with other vertices. In this paper, we assume that one vertex usually shows different aspects when interacting with different neighbor vertices, and should own different embeddings respectively. Therefore, we present Context-Aware Network Embedding (CANE), a novel NE model to address this issue. CANE learns context-aware embeddings for vertices with mutual attention mechanism and is expected to model the semantic relationships between vertices more precisely. In experiments, we compare our model with existing NE models on three real-world datasets. Experimental results show that CANE achieves significant improvement than state-of-the-art methods on link prediction and comparable performance on vertex classification. The source code and datasets can be obtained from https://github.com/thunlp/CANE.

pdf bib
Improved Word Representation Learning with Sememes
Yilin Niu | Ruobing Xie | Zhiyuan Liu | Maosong Sun
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sememes are minimum semantic units of word meanings, and the meaning of each word sense is typically composed by several sememes. Since sememes are not explicit for each word, people manually annotate word sememes and form linguistic common-sense knowledge bases. In this paper, we present that, word sememe information can improve word representation learning (WRL), which maps words into a low-dimensional semantic space and serves as a fundamental step for many NLP tasks. The key idea is to utilize word sememes to capture exact meanings of a word within specific contexts accurately. More specifically, we follow the framework of Skip-gram and present three sememe-encoded models to learn representations of sememes, senses and words, where we apply the attention scheme to detect word senses in various contexts. We conduct experiments on two tasks including word similarity and word analogy, and our models significantly outperform baselines. The results indicate that WRL can benefit from sememes via the attention scheme, and also confirm our models being capable of correctly modeling sememe information.

pdf bib
Incorporating Relation Paths in Neural Relation Extraction
Wenyuan Zeng | Yankai Lin | Zhiyuan Liu | Maosong Sun
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Distantly supervised relation extraction has been widely used to find novel relational facts from plain text. To predict the relation between a pair of two target entities, existing methods solely rely on those direct sentences containing both entities. In fact, there are also many sentences containing only one of the target entities, which also provide rich useful information but not yet employed by relation extraction. To address this issue, we build inference chains between two target entities via intermediate entities, and propose a path-based neural relation extraction model to encode the relational semantics from both direct sentences and inference chains. Experimental results on real-world datasets show that, our model can make full use of those sentences containing only one target entity, and achieves significant and consistent improvements on relation extraction as compared with strong baselines. The source code of this paper can be obtained from https://github.com/thunlp/PathNRE.

2016

pdf bib
Relation Classification via Multi-Level Attention CNNs
Linlin Wang | Zhu Cao | Gerard de Melo | Zhiyuan Liu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Neural Relation Extraction with Selective Attention over Instances
Yankai Lin | Shiqi Shen | Zhiyuan Liu | Huanbo Luan | Maosong Sun
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Neural Sentiment Classification with User and Product Attention
Huimin Chen | Maosong Sun | Cunchao Tu | Yankai Lin | Zhiyuan Liu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
Modeling Relation Paths for Representation Learning of Knowledge Bases
Yankai Lin | Zhiyuan Liu | Huanbo Luan | Maosong Sun | Siwei Rao | Song Liu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Online Learning of Interpretable Word Embeddings
Hongyin Luo | Zhiyuan Liu | Huanbo Luan | Maosong Sun
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning Cross-lingual Word Embeddings via Matrix Co-factorization
Tianze Shi | Zhiyuan Liu | Yang Liu | Maosong Sun
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
A Unified Model for Word Sense Representation and Disambiguation
Xinxiong Chen | Zhiyuan Liu | Maosong Sun
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Distant Supervision for Relation Extraction with Matrix Completion
Miao Fan | Deli Zhao | Qiang Zhou | Zhiyuan Liu | Thomas Fang Zheng | Edward Y. Chang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Topical Word Trigger Model for Keyphrase Extraction
Zhiyuan Liu | Chen Liang | Maosong Sun
Proceedings of COLING 2012

pdf bib
Random Walks on Context-Aware Relation Graphs for Ranking Social Tags
Han Li | Zhiyuan Liu | Maosong Sun
Proceedings of COLING 2012: Posters

pdf bib
Expert Finding for Microblog Misinformation Identification
Chen Liang | Zhiyuan Liu | Maosong Sun
Proceedings of COLING 2012: Posters

pdf bib
Tag Dispatch Model with Social Network Regularization for Microblog User Tag Suggestion
Zhiyuan Liu | Cunchao Tu | Maosong Sun
Proceedings of COLING 2012: Posters

2011

pdf bib
A Simple Word Trigger Method for Social Tag Suggestion
Zhiyuan Liu | Xinxiong Chen | Maosong Sun
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Why Press Backspace? Understanding User Input Behaviors in Chinese Pinyin Input Method
Yabin Zheng | Lixing Xie | Zhiyuan Liu | Maosong Sun | Yang Zhang | Liyun Ru
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Automatic Keyphrase Extraction by Bridging Vocabulary Gap
Zhiyuan Liu | Xinxiong Chen | Yabin Zheng | Maosong Sun
Proceedings of the Fifteenth Conference on Computational Natural Language Learning

2010

pdf bib
Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach
Yabin Zheng | Zhiyuan Liu | Lixing Xie
Proceedings of the ACL 2010 Student Research Workshop

pdf bib
Explore the Structure of Social Tags by Subsumption Relations
Xiance Si | Zhiyuan Liu | Maosong Sun
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Automatic Keyphrase Extraction via Topic Decomposition
Zhiyuan Liu | Wenyi Huang | Yabin Zheng | Maosong Sun
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Clustering to Find Exemplar Terms for Keyphrase Extraction
Zhiyuan Liu | Peng Li | Yabin Zheng | Maosong Sun
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

Search
Co-authors