Weixin Zeng

Also published as: WeiXin Zeng

2025

pdf bib abs
Multi-Modal Entities Matter: Benchmarking Multi-Modal Entity Alignment
GuanChen Xiao | WeiXin Zeng | ShiQi Zhang | MingRui Lao | Xiang Zhao
Proceedings of the 31st International Conference on Computational Linguistics

Multi-modal entity alignment (MMEA) is a long-standing task that aims to discover identical entities between different multi-modal knowledge graphs (MMKGs). However, most of the existing MMEA datasets consider the multi-modal data as the attributes of textual entities, while neglecting the correlations among the multi-modal data and do not fit in the real-world scenarios well. In response, in this work, we establish a novel yet practical MMEA dataset, i.e. NMMEA, which models multi-modal data (e.g., images) equally as textual entities in the MMKG. Due to the introduction of multi-modal data, NMMEA poses new challenges to existing MMEA solutions, i.e., heterogeneous structural representation learning and cross-modal alignment inference. Hence, we put forward a simple yet effective solution, CrossEA, which can effectively learn the structural information of entities by considering both intra-modal and cross-modal relations, and further infer the similarity of different types of entity pairs. Extensive experiments validate the significance of NMMEA, where CrossEA can achieve superior performance in contrast to competitive methods on the proposed dataset.

2024

pdf bib abs
SCL: Selective Contrastive Learning for Data-driven Zero-shot Relation Extraction
Ning Pang | Xiang Zhao | Weixin Zeng | Zhen Tan | Weidong Xiao
Transactions of the Association for Computational Linguistics, Volume 12

Relation extraction has evolved from supervised relation extraction to zero-shot setting due to the continuous emergence of newly generated relations. Some pioneering works handle zero-shot relation extraction by reformulating it into proxy tasks, such as reading comprehension and textual entailment. Nonetheless, the divergence in proxy task formulations from relation extraction hinders the acquisition of informative semantic representations, leading to subpar performance. Therefore, in this paper, we take a data-driven view to handle zero-shot relation extraction under a three-step paradigm, including encoder training, relation clustering, and summarization. Specifically, to train a discriminative relational encoder, we propose a novel selective contrastive learning framework, namely, SCL, where selective importance scores are assigned to distinguish the importance of different negative contrastive instances. During testing, the prompt-based encoder is employed to map test samples into representation vectors, which are then clustered into several groups. Typical samples closest to the cluster centroid are selected for summarization to generate the predicted relation for all samples in the cluster. Moreover, we design a simple non-parametric threshold plugin to reduce false-positive errors in inference on unseen relation representations. Our experiments demonstrate that SCL outperforms the current state-of-the-art method by over 3% across all metrics.

2020

pdf bib abs
CLEEK: A Chinese Long-text Corpus for Entity Linking
Weixin Zeng | Xiang Zhao | Jiuyang Tang | Zhen Tan | Xuqian Huang
Proceedings of the Twelfth Language Resources and Evaluation Conference

Entity linking, as one of the fundamental tasks in natural language processing, is crucial to knowledge fusion, knowledge base construction and update. Nevertheless, in contrast to the research on entity linking for English text, which undergoes continuous development, the Chinese counterpart is still in its infancy. One prominent issue lies in publicly available annotated datasets and evaluation benchmarks, which are lacking and deficient. In specific, existing Chinese corpora for entity linking were mainly constructed from noisy short texts, such as microblogs and news headings, where long texts were largely overlooked, which yet constitute a wider spectrum of real-life scenarios. To address the issue, in this work, we build CLEEK, a Chinese corpus of multi-domain long text for entity linking, in order to encourage advancement of entity linking in languages besides English. The corpus consists of 100 documents from diverse domains, and is publicly accessible. Moreover, we devise a measure to evaluate the difficulty of documents with respect to entity linking, which is then used to characterize the corpus. Additionally, the results of two baselines and seven state-of-the-art solutions on CLEEK are reported and compared. The empirical results validate the usefulness of CLEEK and the effectiveness of proposed difficulty measure.

Co-authors

Venues

Fix data