Yuxin Xiong
2025
Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent
Junda Wu
|
Yuxin Xiong
|
Xintong Li
|
Yu Xia
|
Ruoyu Wang
|
Yu Wang
|
Tong Yu
|
Sungchul Kim
|
Ryan A. Rossi
|
Lina Yao
|
Jingbo Shang
|
Julian McAuley
Findings of the Association for Computational Linguistics: EMNLP 2025
Recent MLLMs have demonstrated strong visual understanding and reasoning after large-scale multimodal pre-training. However, instruction-tuning is typically text-driven with limited visual supervision, leading to significant visual forgetting and degradation of pre-trained visual knowledge. Existing fine-tuning and continual learning methods compress visual representations and emphasize task alignment over visual retention, failing to address this challenge. We present a novel perspective using effective rank to quantify the loss of visual representation richness, framing visual forgetting as excessive compression under the information bottleneck principle. To address this, we propose modality-decoupled gradient descent (MDGD), which regulates gradient updates to preserve the effective rank of visual features and explicitly disentangles visual learning from task-specific alignment. We further introduce a memory-efficient fine-tuning variant using gradient masking for parameter-efficient adaptation. Extensive experiments show that MDGD effectively mitigates visual forgetting across downstream tasks and models, maintaining pre-trained visual knowledge while supporting strong task adaptation.
Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics
Sheldon Yu
|
Yuxin Xiong
|
Junda Wu
|
Xintong Li
|
Tong Yu
|
Xiang Chen
|
Ritwik Sinha
|
Jingbo Shang
|
Julian McAuley
Findings of the Association for Computational Linguistics: EMNLP 2025
Recent advances in chain-of-thought (CoT) prompting have demonstrated the ability of large language models (LLMs) to perform multi-step reasoning. While prior work focuses on improving CoT generation quality or attributing token-level importance, we propose a novel framework to structurally analyze the latent dynamics of CoT trajectories for interpretability. Our method segments generated CoT into discrete reasoning steps, abstracts each step into a spectral embedding based on the eigenvalues of token-level Gram matrices, and clusters these embeddings into semantically meaningful latent states. We model the global evolution of reasoning as a first-order Markov chain over latent clusters, yielding interpretable transition structures. Through t-SNE visualizations and Monte Carlo rollouts, we uncover consistent trajectories across tasks and models, supporting the hypothesis that LLM reasoning follows globally coherent yet abstract paths.
Search
Fix author
Co-authors
- Xintong Li 2
- Julian McAuley 2
- Jingbo Shang 2
- Junda Wu 2
- Tong Yu 2
- show all...