Ruobing Xie


pdf bib
Prompt Tuning for Discriminative Pre-trained Language Models
Yuan Yao | Bowen Dong | Ao Zhang | Zhengyan Zhang | Ruobing Xie | Zhiyuan Liu | Leyu Lin | Maosong Sun | Jianyong Wang
Findings of the Association for Computational Linguistics: ACL 2022

Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks. However, to the best of our knowledge, existing works focus on prompt-tuning generative PLMs that are pre-trained to generate target tokens, such as BERT. It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned. In this work, we present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem. Comprehensive experiments on text classification and question answering show that, compared with vanilla fine-tuning, DPT achieves significantly higher performance, and also prevents the unstable problem in tuning large PLMs in both full-set and low-resource settings.


pdf bib
Incorporating Global Information in Local Attention for Knowledge Representation Learning
Yu Zhao | Han Zhou | Ruobing Xie | Fuzhen Zhuang | Qing Li | Ji Liu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Open Hierarchical Relation Extraction
Kai Zhang | Yuan Yao | Ruobing Xie | Xu Han | Zhiyuan Liu | Fen Lin | Leyu Lin | Maosong Sun
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Open relation extraction (OpenRE) aims to extract novel relation types from open-domain corpora, which plays an important role in completing the relation schemes of knowledge bases (KBs). Most OpenRE methods cast different relation types in isolation without considering their hierarchical dependency. We argue that OpenRE is inherently in close connection with relation hierarchies. To establish the bidirectional connections between OpenRE and relation hierarchy, we propose the task of open hierarchical relation extraction and present a novel OHRE framework for the task. We propose a dynamic hierarchical triplet objective and hierarchical curriculum training paradigm, to effectively integrate hierarchy information into relation representations for better novel relation extraction. We also present a top-down hierarchy expansion algorithm to add the extracted relations into existing hierarchies with reasonable interpretability. Comprehensive experiments show that OHRE outperforms state-of-the-art models by a large margin on both relation clustering and hierarchy expansion.


pdf bib
Connecting Embeddings for Knowledge Graph Entity Typing
Yu Zhao | Anxiang Zhang | Ruobing Xie | Kang Liu | Xiaojie Wang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Knowledge graph (KG) entity typing aims at inferring possible missing entity type instances in KG, which is a very significant but still under-explored subtask of knowledge graph completion. In this paper, we propose a novel approach for KG entity typing which is trained by jointly utilizing local typing knowledge from existing entity type assertions and global triple knowledge in KGs. Specifically, we present two distinct knowledge-driven effective mechanisms of entity type inference. Accordingly, we build two novel embedding models to realize the mechanisms. Afterward, a joint model via connecting them is used to infer missing entity type instances, which favors inferences that agree with both entity type instances and triple knowledge in KGs. Experimental results on two real-world datasets (Freebase and YAGO) demonstrate the effectiveness of our proposed mechanisms and models for improving KG entity typing. The source code and data of this paper can be obtained from: .

pdf bib
Meta-Information Guided Meta-Learning for Few-Shot Relation Classification
Bowen Dong | Yuan Yao | Ruobing Xie | Tianyu Gao | Xu Han | Zhiyuan Liu | Fen Lin | Leyu Lin | Maosong Sun
Proceedings of the 28th International Conference on Computational Linguistics

Few-shot classification requires classifiers to adapt to new classes with only a few training instances. State-of-the-art meta-learning approaches such as MAML learn how to initialize and fast adapt parameters from limited instances, which have shown promising results in few-shot classification. However, existing meta-learning models solely rely on implicit instance-based statistics, and thus suffer from instance unreliability and weak interpretability. To solve this problem, we propose a novel meta-information guided meta-learning (MIML) framework, where semantic concepts of classes provide strong guidance for meta-learning in both initialization and adaptation. In effect, our model can establish connections between instance-based information and semantic-based information, which enables more effective initialization and faster adaptation. Comprehensive experimental results on few-shot relation classification demonstrate the effectiveness of the proposed framework. Notably, MIML achieves comparable or superior performance to humans with only one shot on FewRel evaluation.

pdf bib
Denoising Relation Extraction from Document-level Distant Supervision
Chaojun Xiao | Yuan Yao | Ruobing Xie | Xu Han | Zhiyuan Liu | Maosong Sun | Fen Lin | Leyu Lin
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Distant supervision (DS) has been widely adopted to generate auto-labeled data for sentence-level relation extraction (RE) and achieved great results. However, the existing success of DS cannot be directly transferred to more challenging document-level relation extraction (DocRE), as the inevitable noise caused by DS may be even multiplied in documents and significantly harm the performance of RE. To alleviate this issue, we propose a novel pre-trained model for DocRE, which de-emphasize noisy DS data via multiple pre-training tasks. The experimental results on the large-scale DocRE benchmark show that our model can capture useful information from noisy data and achieve promising results.


pdf bib
Open Relation Extraction: Relational Knowledge Transfer from Supervised Data to Unsupervised Data
Ruidong Wu | Yuan Yao | Xu Han | Ruobing Xie | Zhiyuan Liu | Fen Lin | Leyu Lin | Maosong Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Open relation extraction (OpenRE) aims to extract relational facts from the open-domain corpus. To this end, it discovers relation patterns between named entities and then clusters those semantically equivalent patterns into a united relation cluster. Most OpenRE methods typically confine themselves to unsupervised paradigms, without taking advantage of existing relational facts in knowledge bases (KBs) and their high-quality labeled instances. To address this issue, we propose Relational Siamese Networks (RSNs) to learn similarity metrics of relations from labeled data of pre-defined relations, and then transfer the relational knowledge to identify novel relations in unlabeled data. Experiment results on two real-world datasets show that our framework can achieve significant improvements as compared with other state-of-the-art methods. Our code is available at


pdf bib
Cross-lingual Lexical Sememe Prediction
Fanchao Qi | Yankai Lin | Maosong Sun | Hao Zhu | Ruobing Xie | Zhiyuan Liu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Sememes are defined as the minimum semantic units of human languages. As important knowledge sources, sememe-based linguistic knowledge bases have been widely used in many NLP tasks. However, most languages still do not have sememe-based linguistic knowledge bases. Thus we present a task of cross-lingual lexical sememe prediction, aiming to automatically predict sememes for words in other languages. We propose a novel framework to model correlations between sememes and multi-lingual words in low-dimensional semantic space for sememe prediction. Experimental results on real-world datasets show that our proposed model achieves consistent and significant improvements as compared to baseline methods in cross-lingual sememe prediction. The codes and data of this paper are available at

pdf bib
Language Modeling with Sparse Product of Sememe Experts
Yihong Gu | Jun Yan | Hao Zhu | Zhiyuan Liu | Ruobing Xie | Maosong Sun | Fen Lin | Leyu Lin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Most language modeling methods rely on large-scale data to statistically learn the sequential patterns of words. In this paper, we argue that words are atomic language units but not necessarily atomic semantic units. Inspired by HowNet, we use sememes, the minimum semantic units in human languages, to represent the implicit semantics behind words for language modeling, named Sememe-Driven Language Model (SDLM). More specifically, to predict the next word, SDLM first estimates the sememe distribution given textual context. Afterwards, it regards each sememe as a distinct semantic expert, and these experts jointly identify the most probable senses and the corresponding word. In this way, SDLM enables language models to work beyond word-level manipulation to fine-grained sememe-level semantics, and offers us more powerful tools to fine-tune language models and improve the interpretability as well as the robustness of language models. Experiments on language modeling and the downstream application of headline generation demonstrate the significant effectiveness of SDLM.

pdf bib
Incorporating Chinese Characters of Words for Lexical Sememe Prediction
Huiming Jin | Hao Zhu | Zhiyuan Liu | Ruobing Xie | Maosong Sun | Fen Lin | Leyu Lin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sememes are minimum semantic units of concepts in human languages, such that each word sense is composed of one or multiple sememes. Words are usually manually annotated with their sememes by linguists, and form linguistic common-sense knowledge bases widely used in various NLP tasks. Recently, the lexical sememe prediction task has been introduced. It consists of automatically recommending sememes for words, which is expected to improve annotation efficiency and consistency. However, existing methods of lexical sememe prediction typically rely on the external context of words to represent the meaning, which usually fails to deal with low-frequency and out-of-vocabulary words. To address this issue for Chinese, we propose a novel framework to take advantage of both internal character information and external context information of words. We experiment on HowNet, a Chinese sememe knowledge base, and demonstrate that our framework outperforms state-of-the-art baselines by a large margin, and maintains a robust performance even for low-frequency words.


pdf bib
Improved Word Representation Learning with Sememes
Yilin Niu | Ruobing Xie | Zhiyuan Liu | Maosong Sun
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sememes are minimum semantic units of word meanings, and the meaning of each word sense is typically composed by several sememes. Since sememes are not explicit for each word, people manually annotate word sememes and form linguistic common-sense knowledge bases. In this paper, we present that, word sememe information can improve word representation learning (WRL), which maps words into a low-dimensional semantic space and serves as a fundamental step for many NLP tasks. The key idea is to utilize word sememes to capture exact meanings of a word within specific contexts accurately. More specifically, we follow the framework of Skip-gram and present three sememe-encoded models to learn representations of sememes, senses and words, where we apply the attention scheme to detect word senses in various contexts. We conduct experiments on two tasks including word similarity and word analogy, and our models significantly outperform baselines. The results indicate that WRL can benefit from sememes via the attention scheme, and also confirm our models being capable of correctly modeling sememe information.