Weihua Wang


2025

pdf bib
Unifying Dual-Space Embedding for Entity Alignment via Contrastive Learning
Cunda Wang | Weihua Wang | Qiuyu Liang | Feilong Bao | Guanglai Gao
Proceedings of the 31st International Conference on Computational Linguistics

Entity alignment (EA) aims to match identical entities across different knowledge graphs (KGs). Graph neural network-based entity alignment methods have achieved promising results in Euclidean space. However, KGs often contain complex local and hierarchical structures, which are hard to represent in a single space. In this paper, we propose a novel method named as UniEA, which unifies dual-space embedding to preserve the intrinsic structure of KGs. Specifically, we simultaneously learn graph structure embeddings in both Euclidean and hyperbolic spaces to maximize the consistency between embeddings in the two spaces. Moreover, we employ contrastive learning to mitigate the misalignment issues caused by similar entities, where embeddings of similar neighboring entities become too close. Extensive experiments on benchmark datasets demonstrate that our method achieves state-of-the-art performance in structure-based EA. Our code is available at https://github.com/wonderCS1213/UniEA.

pdf bib
Distance-Adaptive Quaternion Knowledge Graph Embedding with Bidirectional Rotation
Weihua Wang | Qiuyu Liang | Feilong Bao | Guanglai Gao
Proceedings of the 31st International Conference on Computational Linguistics

Quaternion contains one real part and three imaginary parts, which provided a more expressive hypercomplex space for learning knowledge graph. Existing quaternion embedding models measure the plausibility of a triplet either through semantic matching or distance scoring functions. However, it appears that semantic matching diminishes the separability of entities, while the distance scoring function weakens the semantics of entities. To address this issue, we propose a novel quaternion knowledge graph embedding model. Our model combines semantic matching with entity’s geometric distance to better measure the plausibility of triplets. Specifically, in the quaternion space, we perform a right rotation on the head entity and a reverse rotation on the tail entity to learn the rich semantic features. Then, we utilize distance adaptive translations to learn the geometric distance between entities. Furthermore, we provide mathematical proofs to demonstrate our model can handle complex logical relationships. Extensive experimental results and analyses show our model significantly outperforms previous models on well-known knowledge graph completion benchmark datasets. Our code is available at https://anonymous.4open.science/r/l2730.

2024

pdf bib
Lˆ2GC:Lorentzian Linear Graph Convolutional Networks for Node Classification
Qiuyu Liang | Weihua Wang | Feilong Bao | Guanglai Gao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Linear Graph Convolutional Networks (GCNs) are used to classify the node in the graph data. However, we note that most existing linear GCN models perform neural network operations in Euclidean space, which do not explicitly capture the tree-like hierarchical structure exhibited in real-world datasets that modeled as graphs. In this paper, we attempt to introduce hyperbolic space into linear GCN and propose a novel framework for Lorentzian linear GCN. Specifically, we map the learned features of graph nodes into hyperbolic space, and then perform a Lorentzian linear feature transformation to capture the underlying tree-like structure of data. Experimental results on standard citation networks datasets with semi-supervised learning show that our approach yields new state-of-the-art results of accuracy 74.7% on Citeseer and 81.3% on PubMed datasets. Furthermore, we observe that our approach can be trained up to two orders of magnitude faster than other nonlinear GCN models on PubMed dataset. Our code is publicly available at https://github.com/llqy123/LLGC-master.

2020

pdf bib
Mongolian Questions Classification Based on Mulit-Head Attention
Guangyi Wang | Feilong Bao | Weihua Wang
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Question classification is a crucial subtask in question answering system. Mongolian is a kind of few resource language. It lacks public labeled corpus. And the complex morphological structure of Mongolian vocabulary makes the data-sparse problem. This paper proposes a classification model, which combines the Bi-LSTM model with the Multi-Head Attention mechanism. The Multi-Head Attention mechanism extracts relevant information from different dimensions and representation subspace. According to the characteristics of Mongolian word-formation, this paper introduces Mongolian morphemes representation in the embedding layer. Morpheme vector focuses on the semantics of the Mongolian word. In this paper, character vector and morpheme vector are concatenated to get word vector, which sends to the Bi-LSTM getting context representation. Finally, the Multi-Head Attention obtains global information for classification. The model experimented on the Mongolian corpus. Experimental results show that our proposed model significantly outperforms baseline systems.

2016

pdf bib
Mongolian Named Entity Recognition System with Rich Features
Weihua Wang | Feilong Bao | Guanglai Gao
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper, we first build a manually annotated named entity corpus of Mongolian. Then, we propose three morphological processing methods and study comprehensive features, including syllable features, lexical features, context features, morphological features and semantic features in Mongolian named entity recognition. Moreover, we also evaluate the influence of word cluster features on the system and combine all features together eventually. The experimental result shows that segmenting each suffix into an individual token achieves better results than deleting suffixes or using the suffixes as feature. The system based on segmenting suffixes with all proposed features yields benchmark result of F-measure=84.65 on this corpus.