Wenqi Fan

2025

Recommender systems have become increasingly vital in our daily lives, helping to alleviate the problem of information overload across various user-oriented online services. The emergence of Large Language Models (LLMs) has yielded remarkable achievements, demonstrating their potential for the development of next-generation recommender systems. Despite these advancements, LLM-based recommender systems face inherent limitations stemming from their LLM backbones, particularly issues of hallucinations and the lack of up-to-date and domain-specific knowledge.Recently, Retrieval-Augmented Generation (RAG) has garnered significant attention for addressing these limitations by leveraging external knowledge sources to enhance the understanding and generation of LLMs. However, vanilla RAG methods often introduce noise and neglect structural relationships in knowledge, limiting their effectiveness in LLM-based recommendations. To address these limitations, we propose to retrieve high-quality and up-to-date structure information from the knowledge graph (KG) to augment recommendations. Specifically, our approach develops a retrieval-augmented framework, termed K-RagRec, that facilitates the recommendation generation process by incorporating structure information from the external KG. Extensive experiments have been conducted to demonstrate the effectiveness of our proposed method.

pdf bib abs

In AI-facilitated teaching, leveraging various query styles to interpret abstract text descriptions is crucial for ensuring high-quality teaching. However, current retrieval models primarily focus on natural text-image retrieval, making them insufficiently tailored to educational scenarios due to the ambiguities in the retrieval process. In this paper, we propose a diverse expression retrieval task tailored to educational scenarios, supporting retrieval based on multiple query styles and expressions. We introduce the STEM Education Retrieval Dataset (SER), which contains over 24,000 query pairs of different styles, and the Uni-Retrieval, an efficient and style-diversified retrieval vision-language model based on prompt tuning. Uni-Retrieval extracts query style features as prototypes and builds a continuously updated Prompt Bank containing prompt tokens for diverse queries. This bank can updated during test time to represent domain-specific knowledge for different subject retrieval scenarios. Our framework demonstrates scalability and robustness by dynamically retrieving prompt tokens based on prototype similarity, effectively facilitating learning for unknown queries. Experimental results indicate that Uni-Retrieval outperforms existing retrieval models in most retrieval tasks.

pdf bib abs

Rethinking Large Language Model Architectures for Sequential Recommendations
Hanbing Wang | Xiaorui Liu | Wenqi Fan | Xiangyu Zhao | Venkataramana Kini | Devendra Pratap Yadav | Fei Wang | Zhen Wen | Hui Liu
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

In recent times, there has been a shift towards adapting sequential recommendation to LLM paradigm to harness the capabilities of LLMs. These methods typically formulate recommendation data into natural language and train the model to forecast the subsequent item in an auto-regressive manner. Despite their notable success, the significant computational burden during inference poses a major challenge to their practical implementation. In this study, we aim to streamline current LLM-based recommendation models and introduce a straightforward yet highly effective model Lite-LLM4Rec. The primary objective of Lite-LLM4Rec is to ensure efficient inference for the sequential recommendation task. Lite-LLM4Rec circumvents the step-by-step beam search decoding by employing a direct item projection head to produce ranking scores in one step. This design arises from our empirical finding that beam search decoding is ultimately unnecessary for sequential recommendations. Additionally, Lite-LLM4Rec introduces a hierarchical LLM structure crafted to efficiently handle the extensive contextual information of items and redundant computation issue, thus diminishing computational overhead while enjoying the power of LLMs. Experiments on four publicly available datasets validate the efficacy of Lite-LLM4Rec in enhancing both performance and inference efficiency (notably 46.8% performance improvement and 99.48% efficiency improvement on ML-1m) compared to existing LLM-based methods. Our implementations are available at: https://github.com/HanbingWang2001/Lite-LLM4Rec-PyTorch.

pdf bib abs

GLProtein: Global-and-Local Structure Aware Protein Representation Learning
Yunqing Liu | Wenqi Fan | Xiaoyong Wei | Li Qing
Findings of the Association for Computational Linguistics: EMNLP 2025

Proteins are central to biological systems, participating as building blocks across all forms of life. Despite advancements in understanding protein functions through protein sequence analysis, there remains potential for further exploration in integrating protein structural information. We argue that the structural information of proteins is not only limited to their 3D information but also encompasses information from amino acid molecules (local information) to protein-protein structure similarity (global information). To address this, we propose GLProtein, the first framework in protein pre-training that incorporates both global structural similarity and local amino acid details to enhance prediction accuracy and functional insights. GLProtein innovatively combines protein-masked modelling with triplet structure similarity scoring, protein 3D distance encoding and substructure-based amino acid molecule encoding. Experimental results demonstrate that GLProtein outperforms previous methods in several bioinformatics tasks, including predicting protein-protein interactions, contact prediction, and so on.

pdf bib abs

The increasing demand for domain-specific evaluation of large language models (LLMs) has led to the development of numerous benchmarks. These efforts often adhere to the principle of data scaling, relying on large corpora or extensive question-answer (QA) sets to ensure broad coverage. However, the impact of corpus and QA set design on the precision and recall of domain-specific LLM performance remains poorly understood. In this paper, we argue that data scaling is not always the optimal principle for domain-specific benchmark construction. Instead, we introduce Comp-Comp, an iterative benchmarking framework grounded in the principle of comprehensiveness and compactness. Comprehensiveness ensures semantic recall by covering the full breadth of the domain, while compactness improves precision by reducing redundancy and noise. To demonstrate the effectiveness of our approach, we present a case study conducted at a well-renowned university, resulting in the creation of PolyBench, a large-scale, high-quality academic benchmark. Although this study focuses on academia, the Comp-Comp framework is domain-agnostic and readily adaptable to a wide range of specialized fields. The source code and datasets can be accessed at https://github.com/Anya-RB-Chen/COMP-COMP.

pdf bib abs

C2KD: Cross-layer and Cross-head Knowledge Distillation for Small Language Model-based Recommendation
Xiao Chen | Changyi Ma | Wenqi Fan | Zhaoxiang Zhang | Li Qing
Findings of the Association for Computational Linguistics: ACL 2025

Sequential recommenders predict users’ next interactions based on historical behavior and are essential in modern recommendation systems. While Large Language Models (LLMs) show promise, their size and high inference costs limit deployment on resource-constrained devices. Small Language Models (SLMs) provide a more efficient alternative for edge devices, but bridging the recommendation performance gap between LLMs and SLMs remains challenging. Typical approaches like supervised fine-tuning or vanilla knowledge distillation (KD) often lead to suboptimal performance or even negative transfer. Our motivational experiments reveal key issues with vanilla KD methods: feature imitation suffers from redundancy and uneven recommendation ability across layers, while prediction mimicking faces conflicts caused by differing weight distributions of prediction heads. To address these challenges, we propose a simple yet effective framework, C2KD, to transfer task-relevant knowledge from two complementary dimensions. Specifically, our method incorporates: (1) cross-layer feature imitation, which uses a dynamic router to select the most relevant teacher layers and assimilate task-relevant knowledge from the teacher’s late layers, allowing the student to concentrate on the teacher’s specialized knowledge; and (2) cross-head logit distillation, which maps the intermediate features of the student to the teacher’s output head, thereby minimizing prediction discrepancies between the teacher and the student. Extensive experiments across diverse model families demonstrate that our approach enables 1B-parameter SLMs to achieve competitive performance compared to LLMs (e.g., Llama3-8B), offering a practical solution for real-world on-device sequential recommendations.

2024

pdf bib abs

The large-scale conversational recommendation dataset is pivotal for the development of conversational recommender systems (CRS). Most existing CRS datasets suffers from the problems of data inextensibility and semantic inconsistency. To tackle these limitations and establish a benchmark in the conversational recommendation scenario, in this paper, we introduce the LLM-REDIAL dataset to facilitate the research in CRS. LLM-REDIAL is constructed by leveraging large language models (LLMs) to generate the high-quality dialogues. To provide the LLMs with detailed guidance, we integrate historical user behavior data with dialogue templates that are carefully designed through the combination of multiple pre-defined goals. LLM-REDIAL has two main advantages. First, it is the largest multi-domain CRS dataset which consists of 47.6k multi-turn dialogues with 482.6k utterances across 4 domains. Second, dialogue semantics and the users’ historical interaction information is highly consistent. Human evaluation are conducted to verify the quality of LLM-REDIAL. In addition, we evaluate the usability of advanced LLM-based models on LLM-REDIAL.

pdf bib abs

Although large language models (LLMs) have achieved significant success, their vulnerability to adversarial perturbations, including recent jailbreak attacks, has raised considerable concerns. However, the increasing size of these models and their limited access make improving their robustness a challenging task. Among various defense strategies, randomized smoothing has shown great potential for LLMs, as it does not require full access to the model’s parameters or fine-tuning via adversarial training. However, randomized smoothing involves adding noise to the input before model prediction, and the final model’s robustness largely depends on the model’s performance on these noise-corrupted data. Its effectiveness is often limited by the model’s sub-optimal performance on noisy data. To address this issue, we propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions. We call this procedure self-denoised smoothing. Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility. Our experimental results indicate that our method surpasses existing methods in both empirical and certified robustness in defending against adversarial attacks for both downstream tasks and human alignments (i.e., jailbreak attacks). Our code is publicly available at https://github.com/UCSB-NLP-Chang/SelfDenoise.

2020

pdf bib abs

Recently there are increasing concerns about the fairness of Artificial Intelligence (AI) in real-world applications such as computer vision and recommendations. For example, recognition algorithms in computer vision are unfair to black people such as poorly detecting their faces and inappropriately identifying them as “gorillas”. As one crucial application of AI, dialogue systems have been extensively applied in our society. They are usually built with real human conversational data; thus they could inherit some fairness issues which are held in the real world. However, the fairness of dialogue systems has not been well investigated. In this paper, we perform a pioneering study about the fairness issues in dialogue systems. In particular, we construct a benchmark dataset and propose quantitative measures to understand fairness in dialogue models. Our studies demonstrate that popular dialogue models show significant prejudice towards different genders and races. Besides, to mitigate the bias in dialogue systems, we propose two simple but effective debiasing methods. Experiments show that our methods can reduce the bias in dialogue systems significantly. The dataset and the implementation are released to foster fairness research in dialogue systems.