Xiangnan He


pdf bib
Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning
Moxin Li | Fuli Feng | Hanwang Zhang | Xiangnan He | Fengbin Zhu | Tat-Seng Chua
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural discrete reasoning (NDR) has shown remarkable progress in combining deep models with discrete reasoning. However, we find that existing NDR solution suffers from large performance drop on hypothetical questions, e.g. “what the annualized rate of return would be if the revenue in 2020 was doubled”. The key to hypothetical question answering (HQA) is counterfactual thinking, which is a natural ability of human reasoning but difficult for deep models. In this work, we devise a Learning to Imagine (L2I) module, which can be seamlessly incorporated into NDR models to perform the imagination of unseen counterfactual. In particular, we formulate counterfactual thinking into two steps: 1) identifying the fact to intervene, and 2) deriving the counterfactual from the fact and assumption, which are designed as neural networks. Based on TAT-QA, we construct a very challenging HQA dataset with 8,283 hypothetical questions. We apply the proposed L2I to TAGOP, the state-of-the-art solution on TAT-QA, validating the rationality and effectiveness of our approach.


pdf bib
Empowering Language Understanding with Counterfactual Reasoning
Fuli Feng | Jizhi Zhang | Xiangnan He | Hanwang Zhang | Tat-Seng Chua
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021


pdf bib
Graph-based Aspect Representation Learning for Entity Resolution
Zhenqi Zhao | Yuchen Guo | Dingxian Wang | Yufan Huang | Xiangnan He | Bin Gu
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)

Entity Resolution (ER) identifies records that refer to the same real-world entity. Deep learning approaches improved the generalization ability of entity matching models, but hardly overcame the impact of noisy or incomplete data sources. In real scenes, an entity usually consists of multiple semantic facets, called aspects. In this paper, we focus on entity augmentation, namely retrieving the values of missing aspects. The relationship between aspects is naturally suitable to be represented by a knowledge graph, where entity augmentation can be modeled as a link prediction problem. Our paper proposes a novel graph-based approach to solve entity augmentation. Specifically, we apply a dedicated random walk algorithm, which uses node types to limit the traversal length, and encodes graph structure into low-dimensional embeddings. Thus, the missing aspects could be retrieved by a link prediction model. Furthermore, the augmented aspects with fixed orders are served as the input of a deep Siamese BiLSTM network for entity matching. We compared our method with state-of-the-art methods through extensive experiments on downstream ER tasks. According to the experiment results, our model outperforms other methods on evaluation metrics (accuracy, precision, recall, and f1-score) to a large extent, which demonstrates the effectiveness of our method.


pdf bib
Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures
Wenqiang Lei | Xisen Jin | Min-Yen Kan | Zhaochun Ren | Xiangnan He | Dawei Yin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Existing solutions to task-oriented dialogue systems follow pipeline designs which introduces architectural complexity and fragility. We propose a novel, holistic, extendable framework based on a single sequence-to-sequence (seq2seq) model which can be optimized with supervised or reinforcement learning. A key contribution is that we design text spans named belief spans to track dialogue believes, allowing task-oriented dialogue systems to be modeled in a seq2seq way. Based on this, we propose a simplistic Two Stage CopyNet instantiation which emonstrates good scalability: significantly reducing model complexity in terms of number of parameters and training time by a magnitude. It significantly outperforms state-of-the-art pipeline-based methods on large datasets and retains a satisfactory entity match rate on out-of-vocabulary (OOV) cases where pipeline-designed competitors totally fail.

pdf bib
Batch IS NOT Heavy: Learning Word Representations From All Samples
Xin Xin | Fajie Yuan | Xiangnan He | Joemon M. Jose
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Stochastic Gradient Descent (SGD) with negative sampling is the most prevalent approach to learn word representations. However, it is known that sampling methods are biased especially when the sampling distribution deviates from the true data distribution. Besides, SGD suffers from dramatic fluctuation due to the one-sample learning scheme. In this work, we propose AllVec that uses batch gradient learning to generate word representations from all training samples. Remarkably, the time complexity of AllVec remains at the same level as SGD, being determined by the number of positive samples rather than all samples. We evaluate AllVec on several benchmark tasks. Experiments show that AllVec outperforms sampling-based SGD methods with comparable efficiency, especially for small training corpora.


pdf bib
Mining Scientific Terms and their Definitions: A Study of the ACL Anthology
Yiping Jin | Min-Yen Kan | Jun-Ping Ng | Xiangnan He
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing