Feiran Huang

2025

Modeling text-attributed graphs is a well-known problem due to the difficulty of capturing both the text attribute and the graph structure effectively. Existing models often focus on either the text attribute or the graph structure, potentially neglecting the other aspect. This is primarily because both text learning and graph learning models require significant computational resources, making it impractical to directly connect these models in a series. However, there are situations where text-learning models correctly classify text-attributed nodes, while graph-learning models may classify them incorrectly, and vice versa. To fully leverage the potential of text-attributed graphs, we propose a Coupled Text-attributed Graph Learning (CTGL) framework that combines the strengths of both text-learning and graph-learning models in parallel and avoids the computational cost of serially connecting the two aspect models. Specifically, CTGL introduces coupled text-graph augmentation to enable coupled contrastive learning and facilitate the exchange of valuable information between text learning and graph learning. Experimental results on diverse datasets demonstrate the superior performance of our model compared to state-of-the-art text-learning and graph-learning baselines.

2024

Current fact verification methods generally follow the two-stage training paradigm: evidence retrieval and claim verification. While existing works focus on developing sophisticated claim verification modules, the fundamental importance of evidence retrieval is largely ignored. Existing approaches usually adopt the heuristic semantic similarity-based retrieval strategy, resulting in the task-irrelevant evidence and undesirable performance. In this paper, we concentrate on evidence retrieval and propose a Retrieval-Augmented Verification framework RAV, consisting of two major modules: the hybrid evidence retrieval and the joint fact verification. Hybrid evidence retrieval module incorporates an efficient retriever for preliminary pruning of candidate evidence, succeeded by a ranker that generates more precise sorting results. Under this end-to-end training paradigm, gradients from the claim verification can be back-propagated to enhance evidence selection. Experimental results on FEVER dataset demonstrate the superiority of RAV.

Generating accurate SQL queries for user questions (text-to-SQL) has been a long-standing challenge since it requires a deep understanding of both the user’s question and the corresponding database schema in order to retrieve the desired content accurately. Existing methods rely on the comprehensive capability of large language models (LLMs) to generate the SQL. However, some necessary knowledge is not explicitly included in the database schema and user question or has been learned by LLMs. Thus, the generated SQL of the knowledge-insufficient questions may be inaccurate, negatively influencing the text-to-SQL models’ performance and robustness. To address this challenge, we propose the Knowledge-to-SQL framework, which employs tailored Data Expert LLM (DELLM) to provide helpful knowledge for all text-to-SQL models. Specifically, we introduce the detailed implementation of DELLM regarding table reading and the basic fine-tuning process. We further propose a Preference Learning via Database Feedback (PLDBF) strategy, refining the DELLM to generate more helpful knowledge for LLMs. Extensive experiments verify that DELLM can enhance the state-of-the-art approaches for text-to-SQL tasks. The corresponding code of DELLM is released for further research.

Co-authors

Venues

findings2
coling1

Fix data