Zhepei Wei


2022

pdf bib
Learning Semantic Textual Similarity via Topic-informed Discrete Latent Variables
Erxin Yu | Lan Du | Yuan Jin | Zhepei Wei | Yi Chang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Recently, discrete latent variable models have received a surge of interest in both Natural Language Processing (NLP) and Computer Vision (CV), attributed to their comparable performance to the continuous counterparts in representation learning, while being more interpretable in their predictions. In this paper, we develop a topic-informed discrete latent variable model for semantic textual similarity, which learns a shared latent space for sentence-pair representation via vector quantization. Compared with previous models limited to local semantic contexts, our model can explore richer semantic information via topic modeling. We further boost the performance of semantic similarity by injecting the quantized representation into a transformer-based language model with a well-designed semantic-driven attention mechanism. We demonstrate, through extensive experiments across various English language datasets, that our model is able to surpass several strong neural baselines in semantic textual similarity tasks.

pdf bib
Towards Unified Representations of Knowledge Graph and Expert Rules for Machine Learning and Reasoning
Zhepei Wei | Yue Wang | Jinnan Li | Zhining Liu | Erxin Yu | Yuan Tian | Xin Wang | Yi Chang
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

With a knowledge graph and a set of if-then rules, can we reason about the conclusions given a set of observations? In this work, we formalize this question as the cognitive inference problem, and introduce the Cognitive Knowledge Graph (CogKG) that unifies two representations of heterogeneous symbolic knowledge: expert rules and relational facts. We propose a general framework in which the unified knowledge representations can perform both learning and reasoning. Specifically, we implement the above framework in two settings, depending on the availability of labeled data. When no labeled data are available for training, the framework can directly utilize symbolic knowledge as the decision basis and perform reasoning. When labeled data become available, the framework casts symbolic knowledge as a trainable neural architecture and optimizes the connection weights among neurons through gradient descent. Empirical study on two clinical diagnosis benchmarks demonstrates the superiority of the proposed method over time-tested knowledge-driven and data-driven methods, showing the great potential of the proposed method in unifying heterogeneous symbolic knowledge, i.e., expert rules and relational facts, as the substrate of machine learning and reasoning models.

2020

pdf bib
A Novel Cascade Binary Tagging Framework for Relational Triple Extraction
Zhepei Wei | Jianlin Su | Yue Wang | Yuan Tian | Yi Chang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Extracting relational triples from unstructured text is crucial for large-scale knowledge graph construction. However, few existing works excel in solving the overlapping triple problem where multiple relational triples in the same sentence share the same entities. In this work, we introduce a fresh perspective to revisit the relational triple extraction task and propose a novel cascade binary tagging framework (CasRel) derived from a principled problem formulation. Instead of treating relations as discrete labels as in previous works, our new framework models relations as functions that map subjects to objects in a sentence, which naturally handles the overlapping problem. Experiments show that the CasRel framework already outperforms state-of-the-art methods even when its encoder module uses a randomly initialized BERT encoder, showing the power of the new tagging framework. It enjoys further performance boost when employing a pre-trained BERT encoder, outperforming the strongest baseline by 17.5 and 30.2 absolute gain in F1-score on two public datasets NYT and WebNLG, respectively. In-depth analysis on different scenarios of overlapping triples shows that the method delivers consistent performance gain across all these scenarios. The source code and data are released online.