Shu Zhao

2026

Knowledge Graph-enhanced Large Language Models (KG-Enhanced LLMs) integrate the linguistic capabilities of LLMs with the structured semantics of Knowledge Graphs (KGs), showing strong potential in knowledge-intensive reasoning tasks. However, existing methods typically adopt query-driven iterative reasoning from a local perspective, which limits their ability to capture semantically distant but crucial information, leading to dual bottlenecks in efficiency and accuracy for complex multi-hop tasks. To address this issue, we propose MIAoG, a multi-view instructed adaptive reasoning of LLM on KG, which is designed to overcome the limitations of local exploration by enabling LLMs to plan, evaluate, and adapt reasoning paths from a global perspective. Instead of query-anchored exploration, MIAoG first prompts the LLM to generate a multi-view instruction set that outlines diverse potential reasoning paths and explicitly specifies global reasoning intentions to guide the model toward coherent and targeted reasoning. During reasoning, MIAoG integrates a real-time introspection mechanism that evaluates the alignment between the current path and the instructions, adaptively pruning inconsistent trajectories to enhance global consistency while maintaining efficiency. Extensive experiments on multiple public datasets show that MIAoG achieves state-of-the-art performance in KG-enhanced LLM reasoning, particularly excelling in complex multi-hop scenarios.

2025

We introduce TableLLM, a robust large language model (LLM) with 8 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios. We propose a distant supervision method for training, which comprises a reasoning process extension strategy, aiding in training LLMs to understand reasoning patterns more effectively as well as a cross-way validation strategy, ensuring the quality of the automatically generated data. To evaluate the performance of TableLLM, we have crafted benchmarks tailored to address both document and spreadsheet formats as well as constructed a well-organized evaluation pipeline capable of handling both scenarios. Thorough evaluations underscore the advantages of TableLLM when compared to various existing general-purpose and tabular data-focused LLMs. We have publicly released the model checkpoint, source code, benchmarks, and a web application for user interaction on this anonymized repository.

pdf bib abs

Prompt tuning for Large Language Models (LLMs) is vulnerable to backdoor attacks. Existing methods find backdoor attacks to be a significant threat in data-rich scenarios. However, in data-limited scenarios, these methods have difficulty capturing precise backdoor patterns, leading to weakened backdoor attack capabilities and significant side effects for the LLMs, which limits their practical relevance. To explore this problem, we propose a backdoor attacks through contrastive-enhanced machine unlearning in data-limited scenarios, called BCU. Specifically, BCU introduces a multi-objective machine unlearning method to capture precise backdoor patterns by forgetting the association between non-trigger data and the backdoor patterns, reducing side effects. Moreover, we design a contrastive learning strategy to enhance the association between triggers and backdoor patterns, improving the capability of backdoor attacks. Experimental results on 6 NLP datasets and 4 LLMs show that BCU exhibits strong backdoor attack capabilities and slight side effects, whether the training data is rich or limited. Our findings highlight practical security risks of backdoor attacks against LLMs, necessitating further research for security purposes. Our code is available at https://github.com/AHU-YangSJ/BCU.

pdf bib abs

Prompt transfer is a transfer learning method based on prompt tuning, which enhances the parameter performance of prompts in target tasks by transferring source prompt embeddings. Among existing methods, weighted aggregation is effective and possesses the advantages of being lightweight and modular. However, these methods may transfer redundant or irrelevant information from the source prompts to the target prompt, leading to negative impacts. To alleviate this problem, we propose Prompt Contrastive Transformation (PCT), which achieves efficient prompt transfer through prompt contrastive transformation and attentional fusion. PCT transforms the source prompt into task-agnostic embedding and task-specific embeddings through singular value decomposition and contrastive learning, reducing information redundancy among source prompts. The attention module in PCT selects more effective task-specific embeddings and fuses them with task-agnostic embedding into the target prompt. Experimental results show that, despite tuning only 0.035% of task-specific parameters, PCT achieves improvements in prompt transfer for single target task adaptation across various NLP tasks.

2023

pdf bib abs

The large scale of pre-trained language models poses a challenge for their deployment on various devices, with a growing emphasis on methods to compress these models, particularly knowledge distillation. However, current knowledge distillation methods rely on the model’s intermediate layer features and the golden labels (also called hard labels), which usually require aligned model architecture and enough labeled data respectively. Moreover, the parameters of vocabulary are usually neglected in existing methods. To address these problems, we propose a general language model distillation (GLMD) method that performs two-stage word prediction distillation and vocabulary compression, which is simple and surprisingly shows extremely strong performance. Specifically, GLMD supports more general application scenarios by eliminating the constraints of dimension and structure between models and the need for labeled datasets through the absence of intermediate layers and golden labels. Meanwhile, based on the long-tailed distribution of word frequencies in the data, GLMD designs a strategy of vocabulary compression through decreasing vocabulary size instead of dimensionality. Experimental results show that our method outperforms 25 state-of-the-art methods on the SuperGLUE benchmark, achieving an average score that surpasses the best method by 3%.

pdf bib abs

Currently, the reduction in the parameter scale of large-scale pre-trained language models (PLMs) through knowledge distillation has greatly facilitated their widespread deployment on various devices. However, the deployment of knowledge distillation systems faces great challenges in real-world industrial-strength applications, which require the use of complex distillation methods on even larger-scale PLMs (over 10B), limited by memory on GPUs and the switching of methods. To overcome these challenges, we propose GKD, a general knowledge distillation framework that supports distillation on larger-scale PLMs using various distillation methods. With GKD, developers can build larger distillation models on memory-limited GPUs and easily switch and combine different distillation methods within a single framework. Experimental results show that GKD can support the distillation of at least 100B-scale PLMs and 25 mainstream methods on 8 NVIDIA A100 (40GB) GPUs.

2021

pdf bib abs

PDALN: Progressive Domain Adaptation over a Pre-trained Model for Low-Resource Cross-Domain Named Entity Recognition
Tao Zhang | Congying Xia | Philip S. Yu | Zhiwei Liu | Shu Zhao
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Cross-domain Named Entity Recognition (NER) transfers the NER knowledge from high-resource domains to the low-resource target domain. Due to limited labeled resources and domain shift, cross-domain NER is a challenging task. To address these challenges, we propose a progressive domain adaptation Knowledge Distillation (KD) approach – PDALN. It achieves superior domain adaptability by employing three components: (1) Adaptive data augmentation techniques, which alleviate cross-domain gap and label sparsity simultaneously; (2) Multi-level Domain invariant features, derived from a multi-grained MMD (Maximum Mean Discrepancy) approach, to enable knowledge transfer across domains; (3) Advanced KD schema, which progressively enables powerful pre-trained language models to perform domain adaptation. Extensive experiments on four benchmarks show that PDALN can effectively adapt high-resource domains to low-resource target domains, even if they are diverse in terms and writing styles. Comparison with other baselines indicates the state-of-the-art performance of PDALN.