Shuyuan Zhao

2026

The combination of Mixture-of-Experts (MoE) and Low-Rank Adaptation (LoRA) has shown significant potential for enhancing the multi-task learning capabilities of Large Language Models. However, existing methods face two primary challenges: (1)Imprecise Routing in the current MoE-LoRA method fails to explicitly match input semantics with expert capabilities, leading to weak expert specialization. (2)Uniform weight fusion strategies struggle to provide adaptive update strengths, overlooking the varying complexity of different tasks. To address these limitations, we propose SAMoRA (Semantic-Aware Mixture of LoRA Experts), a novel parameter-efficient fine-tuning framework tailored for task-adaptive learning. Specifically, A Semantic-Aware Router is proposed to explicitly align textual semantics with the most suitable experts for precise routing. A Task-Adaptive Scaling mechanism is designed to regulate expert contributions based on specific task requirements dynamically. In addition, a novel regularization objective is proposed to jointly promote expert specialization and effective scaling. Extensive experiments on multiple multi-task benchmarks demonstrate that SAMoRA significantly outperforms the state-of-the-art methods and holds excellent task generalization capabilities. Code is available at https://github.com/boyan-code/SAMoRA

pdf bib abs

Temporal Knowledge Graph (TKG) extrapolation aims to predict future events based on historical facts. Recent studies have attempted to enhance TKG extrapolation by integrating TKG’s evolving structural representations and textual event chains into Large Language Models (LLMs). Yet, two main challenges limit these approaches: (1) The loss of essential spatial-temporal information due to shallow alignment between TKG’s graph evolving structural representation and the LLM’s semantic space, and (2) the progressive dilution of the TKG’s evolving structural features during LLM fine-tuning. To address these challenges, we propose the Spatial-Temporal Knowledge Adapter (STK-Adapter), which flexibly integrates the evolving graph encoder and the LLM to facilitate TKG reasoning. In STK-Adapter, a Spatial-Temporal MoE is designed to capture spatial structures and temporal patterns inherent in TKGs. An Event-Aware MoE is employed to model intricate temporal semantics dependencies within event chains. In addition, a Cross-Modality Alignment MoE is proposed to facilitate deep cross-modality alignment by TKG-guided attention experts. Extensive experiments on benchmark datasets demonstrate that STK-Adapter significantly outperforms state-of-the-art methods and exhibits strong generalization capabilities in cross-dataset task. The code is available at https://github.com/Zhaoshuyuan0246/STK-Adapter.

2024

pdf bib abs

KPatch: Knowledge Patch to Pre-trained Language Model for Zero-Shot Stance Detection on Social Media
Shuohao Lin | Wei Chen | Yunpeng Gao | Zhishu Jiang | Mengqi Liao | Zhiyu Zhang | Shuyuan Zhao | Huaiyu Wan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Zero-shot stance detection on social media (ZSSD-SM) aims to distinguish the attitude in tweets towards an unseen target. Previous work capture latent variables between source and target domains to perform this task, but the lack of context knowledge hinders the detection performance. Recent studies have been devoted to obtaining the accurate representation of tweets by bringing additional facts from Knowledge Graph (KG), showing promising performance. However, these knowledge injection methods still suffer from two challenges: (i) The pipeline of knowledge injection causes error accumulation and (ii) irrelevant knowledge makes them fail to understand the semantics. In this paper, we propose a novel knowledge injection method for ZSSD-SM, which adopts two training stages, namely knowledge compression and task guidance, to flexibly inject knowledge into the pre-trained language model (PLM) and adaptively expand tweets context. Specifically, in the knowledge compression stage, the latent representation of KG is reconstructed by the triplet denoising task and compressed into external matrices; while in the task guidance stage, the frozen matrices are employed to guide the PLM to adaptively extract its own context-related knowledge, and then complete the fine-tuning of the ZSSD-SM task. Extensive experiments on multiple datasets show the effectiveness of our proposed method. The code is available at: https://github.com/ShuohaoLin/KPatch.

Co-authors

Venues

Fix author