Fu Zhang

2025

Entity Alignment (EA) is a critical task in Knowledge Graph (KG) integration, aimed at identifying and matching equivalent entities that represent the same real-world objects. While EA methods based on knowledge representation learning have shown strong performance on synthetic benchmark datasets such as DBP15K, their effectiveness significantly decline in real-world scenarios which often involve data that is highly heterogeneous, incomplete, and domain-specific, as seen in datasets like DOREMUS and AGROLD. Addressing this challenge, we propose DAEA, a novel EA approach with Domain Adaptation that leverages the data characteristics of synthetic benchmarks for improved performance in real-world datasets. DAEA introduces a multi-source KGs selection mechanism and a specialized domain adaptive entity alignment loss function to bridge the gap between real-world data and optimal benchmark data, mitigating the challenges posed by aligning entities across highly heterogeneous KGs. Experimental results demonstrate that DAEA outperforms state-of-the-art models on real-world datasets, achieving a 29.94% improvement in Hits@1 on DOREMUS and a 5.64% improvement on AGROLD. Code is available at https://github.com/yangxiaoxiaoly/DAEA.

pdf bib abs
RRHF-V: Ranking Responses to Mitigate Hallucinations in Multimodal Large Language Models with Human Feedback
Guoqing Chen | Fu Zhang | Jinghao Lin | Chenglong Lu | Jingwei Cheng
Proceedings of the 31st International Conference on Computational Linguistics

Multimodal large language models (MLLMs) demonstrate strong capabilities in multimodal understanding, reasoning, and interaction but still face the fundamental limitation of hallucinations, where they generate erroneous or fabricated information. To mitigate hallucinations, existing methods annotate pair-responses (one non-hallucination vs one hallucination) using manual methods or GPT-4V, and train alignment algorithms to improve the correspondence between images and text. More critically, an image description often involve multiple dimensions (e.g., object attributes, posture, and spatial relationships), making it challenging for the model to comprehensively learn multidimensional information from pair-responses. To this end, in this paper, we propose RRHFV, which is the first using rank-responses (one non-hallucination vs multiple ranking hallucinations) to mitigate multimodal hallucinations. Instead of using pair-responses to train the model, RRHF-V expands the number of hallucinatory responses, so that the responses with different scores in a rank-response enable the model to learn rich semantic information across various dimensions of the image. Further, we propose a scene graph-based approach to automatically construct rank-responses in a cost-effective and automatic manner. We also design a novel training objective based on rank loss and margin loss to balance the differences between hallucinatory responses within a rankresponse, thereby improving the model’s image comprehension. Experiments on two MLLMs of different sizes and four widely used benchmarks demonstrate that RRHF-V is effective in mitigating hallucinations and outperforms the DPO method based on pair-responses.

pdf bib abs
Re-Cent: A Relation-Centric Framework for Joint Zero-Shot Relation Triplet Extraction
Zehan Li | Fu Zhang | Kailun Lyu | Jingwei Cheng | Tianyue Peng
Proceedings of the 31st International Conference on Computational Linguistics

Zero-shot Relation Triplet Extraction (ZSRTE) aims to extract triplets from the context where the relation patterns are unseen during training. Due to the inherent challenges of the ZSRTE task, existing extractive ZSRTE methods often decompose it into named entity recognition and relation classification, which overlooks the interdependence of two tasks and may introduce error propagation. Motivated by the intuition that crucial entity attributes might be implicit in the relation labels, we propose a Relation-Centric joint ZSRTE method named Re-Cent. This approach uses minimal information, specifically unseen relation labels, to extract triplets in one go through a unified model. We develop two span-based extractors to identify the subjects and objects corresponding to relation labels, forming span-pairs. Additionally, we introduce a relation-based correction mechanism that further refines the triplets by calculating the relevance between span-pairs and relation labels. Experiments demonstrate that Re-Cent achieves state-of-the-art performance with fewer parameters and does not rely on synthetic data or manual labor.

pdf bib abs
Exploring the Impacts of Feature Fusion Strategy in Multi-modal Entity Alignment
Chenxiao Li | Jingwei Cheng | Qiang Tong | Fu Zhang
Proceedings of the 31st International Conference on Computational Linguistics

Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs, which consist of structural triples and images associated with entities. Unfortunately, prior works fuse the multi-modal knowledge of all entities only via solely one single fusion strategy. Therefore, the impact of the fusion strategy on individual entities could be largely ignored. To solve this challenge, we propose AMF2SEA, an adaptive multi-modal feature fusion strategy for entity alignment, which dynamically selects the optimal entity-level feature fusion strategy. Additionally, we build a new dataset based on DBP15K, which includes a full set of entity images from multiple inconsistent web sources, making it more representative of the real world. Experimental results demonstrate that our model achieves state-of-the-art (SOTA) performance compared to models using the same modality on DBP15K and its variants with richer image sources and styles. Our code and data are available at https://github.com/ChenxiaoLiJoe/AMFFSEA.

pdf bib abs
SGMEA: Structure-Guided Multimodal Entity Alignment
Jingwei Cheng | Mingxiao Guo | Fu Zhang
Proceedings of the 31st International Conference on Computational Linguistics

Multimodal Entity Alignment (MMEA) aims to identify equivalent entities across different multimodal knowledge graphs (MMKGs) by integrating structural information, entity attributes, and visual data, thereby promoting knowledge sharing and deep multimodal data integration. However, existing methods often overlook the deeper connections between multimodal data. They primarily focus on the interactions between neighboring entities in the structural modality while neglecting the interactions between entities in the visual and attribute modalities. To address this, we propose a structure-guided multimodal entity alignment method (SGMEA), which prioritizes structural information from knowledge graphs to enhance the visual and attribute modalities. By fusing multimodal representations, SGMEA improves the accuracy of entity alignment. Experimental results demonstrate that SGMEA achieves stateof-the-art performance across multiple datasets, validating its effectiveness and superiority in practical applications.

pdf bib abs
CE-DA: Custom Embedding and Dynamic Aggregation for Zero-Shot Relation Extraction
Fu Zhang | He Liu | Zehan Li | Jingwei Cheng
Proceedings of the 31st International Conference on Computational Linguistics

Zero-shot Relation Extraction (ZSRE) aims to predict novel relations from sentences with given entity pairs, where the relations have not been encountered during training. Prototypebased methods, which achieve ZSRE by aligning the sentence representation and the relation prototype representation, have shown great potential. However, most existing works focus solely on improving the quality of prototype representations, neglecting sentence representations and lacking interaction between different types of relation side information. In this paper, we propose a novel ZSRE framework named CE-DA, which includes two modules: Custom Embedding and Dynamic Aggregation. We employ a two-stage approach to obtain customized embeddings of sentences. In the first stage, we train a sentence encoder through unsupervised contrastive learning, and in the second stage, we highlight the potential relations between entities in sentences using carefully designed entity emphasis prompts to further enhance sentence representations. Additionally, our dynamic aggregation method assigns different weights to different types of relation side information through a learnable network to enhance the quality of relation prototype representations. In contrast to traditional methods that treat the importance of all side information equally, our dynamic aggregation method further strengthen the interaction between different types of relation side information. Our method demonstrates competitive performance across various metrics on two ZSRE datasets.

2024

Document-level Relation Extraction (DocRE) aims to extract relations between entity pairs in a document and poses many challenges as it involves multiple mentions of entities and cross-sentence inference. However, several aspects that are important for DocRE have not been considered and explored. Existing work ignores bidirectional mention interaction when generating relational features for entity pairs. Also, sophisticated neural networks are typically designed for cross-sentence evidence extraction to further enhance DocRE. More interestingly, we reveal a noteworthy finding: If a model has predicted a relation between an entity and other entities, this relation information may help infer and predict more relations between the entity’s adjacent entities and these other entities. Nonetheless, none of existing methods leverage secondary reasoning to exploit results of relation prediction. To this end, we propose a novel Secondary Reasoning Framework (SRF) for DocRE. In SRF, we initially propose a DocRE model that incorporates bidirectional mention fusion and a simple yet effective evidence extraction module (incurring only an additional learnable parameter overhead) for relation prediction. Further, for the first time, we elaborately design and propose a novel secondary reasoning method to discover more relations by exploring the results of the first relation prediction. Extensive experiments show that SRF achieves SOTA performance and our secondary reasoning method is both effective and general when integrated into existing models.

pdf bib abs
ATAP: Automatic Template-Augmented Commonsense Knowledge Graph Completion via Pre-Trained Language Models
Fu Zhang | Yifan Ding | Jingwei Cheng
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The mission of commonsense knowledge graph completion (CKGC) is to infer missing facts from known commonsense knowledge. CKGC methods can be roughly divided into two categories: triple-based methods and text-based methods. Due to the imbalanced distribution of entities and limited structural information, triple-based methods struggle with long-tail entities. Text-based methods alleviate this issue, but require extensive training and fine-tuning of language models, which reduces efficiency. To alleviate these problems, we propose ATAP, the first CKGC framework that utilizes automatically generated continuous prompt templates combined with pre-trained language models (PLMs). Moreover, ATAP uses a carefully designed new prompt template training strategy, guiding PLMs to generate optimal prompt templates for CKGC tasks. Combining the rich knowledge of PLMs with the template automatic augmentation strategy, ATAP effectively mitigates the long-tail problem and enhances CKGC performance. Results on benchmark datasets show that ATAP achieves state-of-the-art performance overall.

pdf bib abs
AlignRE: An Encoding and Semantic Alignment Approach for Zero-Shot Relation Extraction
Zehan Li | Fu Zhang | Jingwei Cheng
Findings of the Association for Computational Linguistics: ACL 2024

Zero-shot Relation Extraction (ZSRE) aims to predict unseen relations between entity pairs from input sentences. Existing prototype-based ZSRE methods encode relation descriptions into prototype embeddings and predict by measuring the similarity between sentence embeddings and prototype embeddings. However, these methods often overlook abundant side information of relations and suffer from a significant encoding gap between prototypes and sentences, limiting performance. To this end, we propose a framework named AlignRE, based on two Alignment methods for ZSRE. Specifically, we present a novel perspective centered on encoding schema alignment to enhance prototype-based ZSRE methods. We utilize well-designed prompt-tuning to bridge the encoding gap. To improve prototype quality, we explore and leverage multiple side information and propose a prototype aggregation method based on semantic alignment to create comprehensive relation prototype representations. We conduct experiments on FewRel and Wiki-ZSL datasets and consistently outperform state-of-the-art methods. Moreover, our method exhibits substantially faster performance and reduces the need for extensive manual labor in prototype construction. Code is available at https://github.com/lizehan1999/AlignRE.

pdf bib abs
Advancing Cross-Lingual Entity Alignment with Large Language Models: Tailored Sample Segmentation and Zero-Shot Prompts
Linyan Yang | Jingwei Cheng | Fu Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024

In recent years, the advent of large language models (LLMs) like GPT and Llama has significantly influenced numerous domains, particularly in advancing natural language processing (NLP) capabilities. LLMs have shown remarkable performance in NLP tasks such as relation extraction (RE) and knowledge graph completion (KGC), enhancing activities related to knowledge graphs. As a result, there is a growing interest in integrating LLMs into cross-lingual entity alignment (EA) task, which aims to identify equivalent entities across various knowledge graphs, thereby improving the performance of current baselines. However, employing LLMs for entity alignment poses challenges in efficiently handling large-scale data, generating suitable data samples, and adapting prompts for the EA task. To tackle these challenges, we propose Seg-Align, an innovative framework that integrating distance feature extraction, sample **Seg**mentation, and zero-shot prompts. Through extensive experiments on two widely used cross-lingual benchmark datasets, we have not only demonstrated the effectiveness of our proposed sample segmentation algorithm but also highlighted the state-of-the-art performance of Seg-Align. Code is available at https://github.com/yangxiaoxiaoly/Seg-Align.

pdf bib abs
SALMON: A Structure-Aware Language Model with logicality and densification strategy for Temporal Knowledge Graph Reasoning
Fu Zhang | Jinghao Lin | Jingwei Cheng
Findings of the Association for Computational Linguistics: EMNLP 2024

Temporal knowledge graph reasoning (TKGR) is a crucial task that involves reasoning at known timestamps to complete the future facts and has attracted more and more attention in recent years. The current TKGR models are mainly based on graph neural networks or tensor decomposition techniques. Few works in TKGR focus on pre-trained language models (PLMs) which have powerful sequence modeling capabilities to capture the temporal associations between facts. In this paper, we propose a model SALMON: a Structure-Aware Language Model with logicality and densification strategy. Specifically, we design a PLM-based framework with a structure-aware layer inside to jointly capture the temporal evolving pattern and structural information in TKGs. To further enhance the model’s ability to infer causal associations of facts, we propose a logical judging module, which can guide the model to prioritize learning the most relevant evolving information of logical causal associations in TKGs during the training process. Moreover, we propose a densification strategy based on large language models, through a carefully crafted Chain of Thought prompt, to dig out some knowledge necessary for reasoning about fact associations, thereby making the model perform better. Extensive experimental results demonstrate the superiority of our model over the state-of-the-art baselines.

pdf bib abs
NALA: an Effective and Interpretable Entity Alignment Method
Chuanhao Xu | Jingwei Cheng | Fu Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024

Entity alignment (EA) aims to find equivalent entities between two Knowledge Graphs. Existing embedding-based EA methods usually encode entities as embeddings, triples as embeddings’ constraint and learn to align the embeddings. However, the details of the underlying logical inference steps among the alignment process are usually omitted, resulting in inadequate inference process. In this paper, we introduce NALA, an entity alignment method that captures three types of logical inference paths with Non-Axiomatic Logic (NAL). Type 1&2 align the entity pairs and type 3 aligns relations. NALA iteratively aligns entities and relations by integrating the conclusions of the inference paths. Our method is logically interpretable and extensible by introducing NAL, and thus suitable for various EA settings. Experimental results show that NALA outperforms state-of-the-art methods in terms of Hits@1, achieving 0.98+ on all three datasets of DBP15K with both supervised and unsupervised settings. We offer a pioneering in-depth analysis of the fundamental principles of entity alignment, approaching the subject from a unified and logical perspective. Our code is available at https://github.com/13998151318/NALA.