Yixin Cao


pdf bib
How Knowledge Graph and Attention Help? A Qualitative Analysis into Bag-level Relation Extraction
Zikun Hu | Yixin Cao | Lifu Huang | Tat-Seng Chua
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Knowledge Graph (KG) and attention mechanism have been demonstrated effective in introducing and selecting useful information for weakly supervised methods. However, only qualitative analysis and ablation study are provided as evidence. In this paper, we contribute a dataset and propose a paradigm to quantitatively evaluate the effect of attention and KG on bag-level relation extraction (RE). We find that (1) higher attention accuracy may lead to worse performance as it may harm the model’s ability to extract entity mention features; (2) the performance of attention is largely influenced by various noise distribution patterns, which is closely related to real-world datasets; (3) KG-enhanced attention indeed improves RE performance, while not through enhanced attention but by incorporating entity prior; and (4) attention mechanism may exacerbate the issue of insufficient training data. Based on these findings, we show that a straightforward variant of RE model can achieve significant improvements (6% AUC on average) on two real-world datasets as compared with three state-of-the-art baselines. Our codes and datasets are available at https://github.com/zig-kwin-hu/how-KG-ATT-help.

pdf bib
Learning from Miscellaneous Other-Class Words for Few-shot Named Entity Recognition
Meihan Tong | Shuai Wang | Bin Xu | Yixin Cao | Minghui Liu | Lei Hou | Juanzi Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Few-shot Named Entity Recognition (NER) exploits only a handful of annotations to iden- tify and classify named entity mentions. Pro- totypical network shows superior performance on few-shot NER. However, existing prototyp- ical methods fail to differentiate rich seman- tics in other-class words, which will aggravate overfitting under few shot scenario. To address the issue, we propose a novel model, Mining Undefined Classes from Other-class (MUCO), that can automatically induce different unde- fined classes from the other class to improve few-shot NER. With these extra-labeled unde- fined classes, our method will improve the dis- criminative ability of NER classifier and en- hance the understanding of predefined classes with stand-by semantic knowledge. Experi- mental results demonstrate that our model out- performs five state-of-the-art models in both 1- shot and 5-shots settings on four NER bench- marks. We will release the code upon accep- tance. The source code is released on https: //github.com/shuaiwa16/OtherClassNER.git.

pdf bib
Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion
Yixin Cao | Xiang Ji | Xin Lv | Juanzi Li | Yonggang Wen | Hanwang Zhang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We present InferWiki, a Knowledge Graph Completion (KGC) dataset that improves upon existing benchmarks in inferential ability, assumptions, and patterns. First, each testing sample is predictable with supportive data in the training set. To ensure it, we propose to utilize rule-guided train/test generation, instead of conventional random split. Second, InferWiki initiates the evaluation following the open-world assumption and improves the inferential difficulty of the closed-world assumption, by providing manually annotated negative and unknown triples. Third, we include various inference patterns (e.g., reasoning path length and types) for comprehensive evaluation. In experiments, we curate two settings of InferWiki varying in sizes and structures, and apply the construction process on CoDEx as comparative datasets. The results and empirical analyses demonstrate the necessity and high-quality of InferWiki. Nevertheless, the performance gap among various inferential assumptions and patterns presents the difficulty and inspires future research direction. Our datasets can be found in https://github.com/TaoMiner/inferwiki.


pdf bib
Expertise Style Transfer: A New Task Towards Better Communication between Experts and Laymen
Yixin Cao | Ruihao Shui | Liangming Pan | Min-Yen Kan | Zhiyuan Liu | Tat-Seng Chua
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The curse of knowledge can impede communication between experts and laymen. We propose a new task of expertise style transfer and contribute a manually annotated dataset with the goal of alleviating such cognitive biases. Solving this task not only simplifies the professional language, but also improves the accuracy and expertise level of laymen descriptions using simple words. This is a challenging task, unaddressed in previous work, as it requires the models to have expert intelligence in order to modify text with a deep understanding of domain knowledge and structures. We establish the benchmark performance of five state-of-the-art models for style transfer and text simplification. The results demonstrate a significant gap between machine and human performance. We also discuss the challenges of automatic evaluation, to provide insights into future research directions. The dataset is publicly available at https://srhthu.github.io/expertise-style-transfer/.

pdf bib
Improving Event Detection via Open-domain Trigger Knowledge
Meihan Tong | Bin Xu | Shuai Wang | Yixin Cao | Lei Hou | Juanzi Li | Jun Xie
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Event Detection (ED) is a fundamental task in automatically structuring texts. Due to the small scale of training data, previous methods perform poorly on unseen/sparsely labeled trigger words and are prone to overfitting densely labeled trigger words. To address the issue, we propose a novel Enrichment Knowledge Distillation (EKD) model to leverage external open-domain trigger knowledge to reduce the in-built biases to frequent trigger words in annotations. Experiments on benchmark ACE2005 show that our model outperforms nine strong baselines, is especially effective for unseen/sparsely labeled trigger words. The source code is released on https://github.com/shuaiwa16/ekd.git.

pdf bib
Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment
Zhiyuan Liu | Yixin Cao | Liangming Pan | Juanzi Li | Zhiyuan Liu | Tat-Seng Chua
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Entity alignment (EA) aims at building a unified Knowledge Graph (KG) of rich content by linking the equivalent entities from various KGs. GNN-based EA methods present promising performance by modeling the KG structure defined by relation triples. However, attribute triples can also provide crucial alignment signal but have not been well explored yet. In this paper, we propose to utilize an attributed value encoder and partition the KG into subgraphs to model the various types of attribute triples efficiently. Besides, the performances of current EA methods are overestimated because of the name-bias of existing EA datasets. To make an objective evaluation, we propose a hard experimental setting where we select equivalent entity pairs with very different names as the test set. Under both the regular and hard settings, our method achieves significant improvements (5.10% on average Hits@1 in DBP15k) over 12 baselines in cross-lingual and monolingual datasets. Ablation studies on different subgraphs and a case study about attribute types further demonstrate the effectiveness of our method. Source code and data can be found at https://github.com/thunlp/explore-and-evaluate.


pdf bib
Low-Resource Name Tagging Learned with Weakly Labeled Data
Yixin Cao | Zikun Hu | Tat-seng Chua | Zhiyuan Liu | Heng Ji
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Name tagging in low-resource languages or domains suffers from inadequate training data. Existing work heavily relies on additional information, while leaving those noisy annotations unexplored that extensively exist on the web. In this paper, we propose a novel neural model for name tagging solely based on weakly labeled (WL) data, so that it can be applied in any low-resource settings. To take the best advantage of all WL sentences, we split them into high-quality and noisy portions for two modules, respectively: (1) a classification module focusing on the large portion of noisy data can efficiently and robustly pretrain the tag classifier by capturing textual context semantics; and (2) a costly sequence labeling module focusing on high-quality data utilizes Partial-CRFs with non-entity sampling to achieve global optimum. Two modules are combined via shared parameters. Extensive experiments involving five low-resource languages and fine-grained food domain demonstrate our superior performance (6% and 7.8% F1 gains on average) as well as efficiency.

pdf bib
Semi-supervised Entity Alignment via Joint Knowledge Embedding Model and Cross-graph Model
Chengjiang Li | Yixin Cao | Lei Hou | Jiaxin Shi | Juanzi Li | Tat-Seng Chua
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Entity alignment aims at integrating complementary knowledge graphs (KGs) from different sources or languages, which may benefit many knowledge-driven applications. It is challenging due to the heterogeneity of KGs and limited seed alignments. In this paper, we propose a semi-supervised entity alignment method by joint Knowledge Embedding model and Cross-Graph model (KECG). It can make better use of seed alignments to propagate over the entire graphs with KG-based constraints. Specifically, as for the knowledge embedding model, we utilize TransE to implicitly complete two KGs towards consistency and learn relational constraints between entities. As for the cross-graph model, we extend Graph Attention Network (GAT) with projection constraint to robustly encode graphs, and two KGs share the same GAT to transfer structural knowledge as well as to ignore unimportant neighbors for alignment via attention mechanism. Results on publicly available datasets as well as further analysis demonstrate the effectiveness of KECG. Our codes can be found in https: //github.com/THU-KEG/KECG.

pdf bib
Multi-Channel Graph Neural Network for Entity Alignment
Yixin Cao | Zhiyuan Liu | Chengjiang Li | Zhiyuan Liu | Juanzi Li | Tat-Seng Chua
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Entity alignment typically suffers from the issues of structural heterogeneity and limited seed alignments. In this paper, we propose a novel Multi-channel Graph Neural Network model (MuGNN) to learn alignment-oriented knowledge graph (KG) embeddings by robustly encoding two KGs via multiple channels. Each channel encodes KGs via different relation weighting schemes with respect to self-attention towards KG completion and cross-KG attention for pruning exclusive entities respectively, which are further combined via pooling techniques. Moreover, we also infer and transfer rule knowledge for completing two KGs consistently. MuGNN is expected to reconcile the structural differences of two KGs, and thus make better use of seed alignments. Extensive experiments on five publicly available datasets demonstrate our superior performance (5% Hits@1 up on average). Source code and data used in the experiments can be accessed at https://github.com/thunlp/MuGNN .


pdf bib
Neural Collective Entity Linking
Yixin Cao | Lei Hou | Juanzi Li | Zhiyuan Liu
Proceedings of the 27th International Conference on Computational Linguistics

Entity Linking aims to link entity mentions in texts to knowledge bases, and neural models have achieved recent success in this task. However, most existing methods rely on local contexts to resolve entities independently, which may usually fail due to the data sparsity of local information. To address this issue, we propose a novel neural model for collective entity linking, named as NCEL. NCEL apply Graph Convolutional Network to integrate both local contextual features and global coherence information for entity linking. To improve the computation efficiency, we approximately perform graph convolution on a subgraph of adjacent entity mentions instead of those in the entire text. We further introduce an attention scheme to improve the robustness of NCEL to data noise and train the model on Wikipedia hyperlinks to avoid overfitting and domain bias. In experiments, we evaluate NCEL on five publicly available datasets to verify the linking performance as well as generalization ability. We also conduct an extensive analysis of time complexity, the impact of key modules, and qualitative results, which demonstrate the effectiveness and efficiency of our proposed method.

pdf bib
Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision
Yixin Cao | Lei Hou | Juanzi Li | Zhiyuan Liu | Chengjiang Li | Xu Chen | Tiansi Dong
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Jointly representation learning of words and entities benefits many NLP tasks, but has not been well explored in cross-lingual settings. In this paper, we propose a novel method for joint representation learning of cross-lingual words and entities. It captures mutually complementary knowledge, and enables cross-lingual inferences among knowledge bases and texts. Our method does not require parallel corpus, and automatically generates comparable data via distant supervision using multi-lingual knowledge bases. We utilize two types of regularizers to align cross-lingual words and entities, and design knowledge attention and cross-lingual attention to further reduce noises. We conducted a series of experiments on three tasks: word translation, entity relatedness, and cross-lingual entity linking. The results, both qualitative and quantitative, demonstrate the significance of our method.


pdf bib
Bridge Text and Knowledge by Learning Multi-Prototype Entity Mention Embedding
Yixin Cao | Lifu Huang | Heng Ji | Xu Chen | Juanzi Li
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Integrating text and knowledge into a unified semantic space has attracted significant research interests recently. However, the ambiguity in the common space remains a challenge, namely that the same mention phrase usually refers to various entities. In this paper, to deal with the ambiguity of entity mentions, we propose a novel Multi-Prototype Mention Embedding model, which learns multiple sense embeddings for each mention by jointly modeling words from textual contexts and entities derived from a knowledge base. In addition, we further design an efficient language model based approach to disambiguate each mention to a specific sense. In experiments, both qualitative and quantitative analysis demonstrate the high quality of the word, entity and multi-prototype mention embeddings. Using entity linking as a study case, we apply our disambiguation method as well as the multi-prototype mention embeddings on the benchmark dataset, and achieve state-of-the-art performance.

pdf bib
On Modeling Sense Relatedness in Multi-prototype Word Embedding
Yixin Cao | Jiaxin Shi | Juanzi Li | Zhiyuan Liu | Chengjiang Li
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

To enhance the expression ability of distributional word representation learning model, many researchers tend to induce word senses through clustering, and learn multiple embedding vectors for each word, namely multi-prototype word embedding model. However, most related work ignores the relatedness among word senses which actually plays an important role. In this paper, we propose a novel approach to capture word sense relatedness in multi-prototype word embedding model. Particularly, we differentiate the original sense and extended senses of a word by introducing their global occurrence information and model their relatedness through the local textual context information. Based on the idea of fuzzy clustering, we introduce a random process to integrate these two types of senses and design two non-parametric methods for word sense induction. To make our model more scalable and efficient, we use an online joint learning framework extended from the Skip-gram model. The experimental results demonstrate that our model outperforms both conventional single-prototype embedding models and other multi-prototype embedding models, and achieves more stable performance when trained on smaller data.


pdf bib
Name List Only? Target Entity Disambiguation in Short Texts
Yixin Cao | Juanzi Li | Xiaofei Guo | Shuanhu Bai | Heng Ji | Jie Tang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing