Xintong Li
Papers on this page may belong to the following people: Xintong Li (UCSD), Xintong Li (CUHK, OSU, Baidu)
2025
Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning
Xintong Li | Jalend Bantupalli | Ria Dharmani | Yuwei Zhang | Jingbo Shang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xintong Li | Jalend Bantupalli | Ria Dharmani | Yuwei Zhang | Jingbo Shang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
There has been a surge in the use of large language models (LLM) conversational agents to generate responses based on long-term history from multiple sessions. However, existing long-term open-domain dialogue datasets lack complex, real-world personalization and fail to capture implicit reasoning—where relevant information is embedded in subtle, syntactic, or semantically distant connections rather than explicit statements. In such cases, traditional retrieval methods fail to capture relevant context, and long-context modeling also becomes inefficient due to numerous complicated persona-related details. To address this gap, we introduce ImplexConv, a large-scale long-term dataset with 2,500 examples, each containing approximately 100 conversation sessions, designed to study implicit reasoning in personalized dialogues. Additionally, we propose TaciTree, a novel hierarchical tree framework that structures conversation history into multiple levels of summarization. Instead of brute-force searching all data, TaciTree enables an efficient, level-based retrieval process where models refine their search by progressively selecting relevant details. Our experiments demonstrate that TaciTree significantly improves the ability of LLMs to reason over long-term conversations with implicit contextual dependencies.
From Selection to Generation: A Survey of LLM-based Active Learning
Yu Xia | Subhojyoti Mukherjee | Zhouhang Xie | Junda Wu | Xintong Li | Ryan Aponte | Hanjia Lyu | Joe Barrow | Hongjie Chen | Franck Dernoncourt | Branislav Kveton | Tong Yu | Ruiyi Zhang | Jiuxiang Gu | Nesreen K. Ahmed | Yu Wang | Xiang Chen | Hanieh Deilamsalehy | Sungchul Kim | Zhengmian Hu | Yue Zhao | Nedim Lipka | Seunghyun Yoon | Ting-Hao Kenneth Huang | Zichao Wang | Puneet Mathur | Soumyabrata Pal | Koyel Mukherjee | Zhehao Zhang | Namyong Park | Thien Huu Nguyen | Jiebo Luo | Ryan A. Rossi | Julian McAuley
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yu Xia | Subhojyoti Mukherjee | Zhouhang Xie | Junda Wu | Xintong Li | Ryan Aponte | Hanjia Lyu | Joe Barrow | Hongjie Chen | Franck Dernoncourt | Branislav Kveton | Tong Yu | Ruiyi Zhang | Jiuxiang Gu | Nesreen K. Ahmed | Yu Wang | Xiang Chen | Hanieh Deilamsalehy | Sungchul Kim | Zhengmian Hu | Yue Zhao | Nedim Lipka | Seunghyun Yoon | Ting-Hao Kenneth Huang | Zichao Wang | Puneet Mathur | Soumyabrata Pal | Koyel Mukherjee | Zhehao Zhang | Namyong Park | Thien Huu Nguyen | Jiebo Luo | Ryan A. Rossi | Julian McAuley
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the increasing importance of high-quality data and efficient model training in the era of LLMs, we present a comprehensive survey on LLM-based Active Learning. We introduce an intuitive taxonomy that categorizes these techniques and discuss the transformative roles LLMs can play in the active learning loop. We further examine the impact of AL on LLM learning paradigms and its applications across various domains. Finally, we identify open challenges and propose future research directions. This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques and deploy them to new applications.
GUI Agents: A Survey
Dang Nguyen | Jian Chen | Yu Wang | Gang Wu | Namyong Park | Zhengmian Hu | Hanjia Lyu | Junda Wu | Ryan Aponte | Yu Xia | Xintong Li | Jing Shi | Hongjie Chen | Viet Dac Lai | Zhouhang Xie | Sungchul Kim | Ruiyi Zhang | Tong Yu | Mehrab Tanjim | Nesreen K. Ahmed | Puneet Mathur | Seunghyun Yoon | Lina Yao | Branislav Kveton | Jihyung Kil | Thien Huu Nguyen | Trung Bui | Tianyi Zhou | Ryan A. Rossi | Franck Dernoncourt
Findings of the Association for Computational Linguistics: ACL 2025
Dang Nguyen | Jian Chen | Yu Wang | Gang Wu | Namyong Park | Zhengmian Hu | Hanjia Lyu | Junda Wu | Ryan Aponte | Yu Xia | Xintong Li | Jing Shi | Hongjie Chen | Viet Dac Lai | Zhouhang Xie | Sungchul Kim | Ruiyi Zhang | Tong Yu | Mehrab Tanjim | Nesreen K. Ahmed | Puneet Mathur | Seunghyun Yoon | Lina Yao | Branislav Kveton | Jihyung Kil | Thien Huu Nguyen | Trung Bui | Tianyi Zhou | Ryan A. Rossi | Franck Dernoncourt
Findings of the Association for Computational Linguistics: ACL 2025
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse platforms. Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods. We propose a unified framework that delineates their perception, reasoning, planning, and acting capabilities. Furthermore, we identify important open challenges and discuss key future directions. Finally, this work serves as a basis for practitioners and researchers to gain an intuitive understanding of current progress, techniques, benchmarks, and critical open problems that remain to be addressed.
Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent
Junda Wu | Yuxin Xiong | Xintong Li | Yu Xia | Ruoyu Wang | Yu Wang | Tong Yu | Sungchul Kim | Ryan A. Rossi | Lina Yao | Jingbo Shang | Julian McAuley
Findings of the Association for Computational Linguistics: EMNLP 2025
Junda Wu | Yuxin Xiong | Xintong Li | Yu Xia | Ruoyu Wang | Yu Wang | Tong Yu | Sungchul Kim | Ryan A. Rossi | Lina Yao | Jingbo Shang | Julian McAuley
Findings of the Association for Computational Linguistics: EMNLP 2025
Recent MLLMs have demonstrated strong visual understanding and reasoning after large-scale multimodal pre-training. However, instruction-tuning is typically text-driven with limited visual supervision, leading to significant visual forgetting and degradation of pre-trained visual knowledge. Existing fine-tuning and continual learning methods compress visual representations and emphasize task alignment over visual retention, failing to address this challenge. We present a novel perspective using effective rank to quantify the loss of visual representation richness, framing visual forgetting as excessive compression under the information bottleneck principle. To address this, we propose modality-decoupled gradient descent (MDGD), which regulates gradient updates to preserve the effective rank of visual features and explicitly disentangles visual learning from task-specific alignment. We further introduce a memory-efficient fine-tuning variant using gradient masking for parameter-efficient adaptation. Extensive experiments show that MDGD effectively mitigates visual forgetting across downstream tasks and models, maintaining pre-trained visual knowledge while supporting strong task adaptation.
Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics
Sheldon Yu | Yuxin Xiong | Junda Wu | Xintong Li | Tong Yu | Xiang Chen | Ritwik Sinha | Jingbo Shang | Julian McAuley
Findings of the Association for Computational Linguistics: EMNLP 2025
Sheldon Yu | Yuxin Xiong | Junda Wu | Xintong Li | Tong Yu | Xiang Chen | Ritwik Sinha | Jingbo Shang | Julian McAuley
Findings of the Association for Computational Linguistics: EMNLP 2025
Recent advances in chain-of-thought (CoT) prompting have demonstrated the ability of large language models (LLMs) to perform multi-step reasoning. While prior work focuses on improving CoT generation quality or attributing token-level importance, we propose a novel framework to structurally analyze the latent dynamics of CoT trajectories for interpretability. Our method segments generated CoT into discrete reasoning steps, abstracts each step into a spectral embedding based on the eigenvalues of token-level Gram matrices, and clusters these embeddings into semantically meaningful latent states. We model the global evolution of reasoning as a first-order Markov chain over latent clusters, yielding interpretable transition structures. Through t-SNE visualizations and Monte Carlo rollouts, we uncover consistent trajectories across tasks and models, supporting the hypothesis that LLM reasoning follows globally coherent yet abstract paths.
CoMMIT: Coordinated Multimodal Instruction Tuning
Xintong Li | Junda Wu | Tong Yu | Rui Wang | Yu Wang | Xiang Chen | Jiuxiang Gu | Lina Yao | Julian McAuley | Jingbo Shang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xintong Li | Junda Wu | Tong Yu | Rui Wang | Yu Wang | Xiang Chen | Jiuxiang Gu | Lina Yao | Julian McAuley | Jingbo Shang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Instruction tuning in multimodal large language models (MLLMs) generally involves cooperative learning between a backbone LLM and a feature encoder of non-text input modalities. The major challenge is how to efficiently find the synergy between the two modules so that LLMs can adapt their reasoning abilities to downstream tasks while feature encoders can adjust to provide more task-specific information about its modality. In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives, where we find the unbalanced learning between the feature encoder and the LLM can cause problems of oscillation and biased learning that lead to sub-optimal convergence. Inspired by our findings, we propose a Multimodal Balance Coefficient that enables quantitative measurement of the balance of learning. Based on this, we further design a dynamic learning scheduler that better coordinates the learning between the LLM and feature encoder, alleviating the problems of oscillation and biased learning. In addition, we introduce an auxiliary regularization on the gradient to promote updating with larger step sizes, which potentially allows for a more accurate estimation of the proposed MultiModal Balance Coefficient and further improves the training sufficiency. Our proposed approach is agnostic to the architecture of LLM and feature encoder, so it can be generically integrated with various MLLMs. We conduct experiments on multiple downstream tasks with various MLLMs, demonstrating that the proposed method is more effective than the baselines in MLLM instruction tuning.
Search
Fix author
Co-authors
- Junda Wu 5
- Tong Yu 5
- Julian McAuley 4
- Jingbo Shang 4
- Xiang Chen 3
- Sungchul Kim 3
- Ryan A. Rossi 3
- Yu Xia 3
- Lina Yao 3
- Nesreen K. Ahmed 2
- Ryan Aponte 2
- Hongjie Chen 2
- Franck Dernoncourt 2
- Jiuxiang Gu 2
- Zhengmian Hu 2
- Branislav Kveton 2
- Hanjia Lyu 2
- Puneet Mathur 2
- Thien Huu Nguyen 2
- Namyong Park 2
- Yu Wang 2
- Yu Wang 2
- Zhouhang Xie 2
- Yuxin Xiong 2
- Seunghyun Yoon 2
- Ruiyi Zhang 2
- Jalend Bantupalli 1
- Joe Barrow 1
- Trung Bui 1
- Jian Chen 1
- Hanieh Deilamsalehy 1
- Ria Dharmani 1
- Ting-Hao Huang 1
- Jihyung Kil 1
- Viet Dac Lai 1
- Nedim Lipka 1
- Jiebo Luo 1
- Subhojyoti Mukherjee 1
- Koyel Mukherjee 1
- Dang Nguyen 1
- Soumyabrata Pal 1
- Jing Shi 1
- Ritwik Sinha 1
- Mehrab Tanjim 1
- Zichao Wang 1
- Ruoyu Wang 1
- Rui Wang 1
- Gang Wu 1
- Sheldon Yu 1
- Yuwei Zhang 1
- Zhehao Zhang 1
- Yue Zhao 1
- Tianyi Zhou 1