TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration

Yanshu Li; Jianjiang Yang; Tian Yun; Pinyuan Feng; Jinfa Huang; Ruixiang Tang

doi:10.18653/v1/2025.emnlp-main.39

TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration

Yanshu Li, Jianjiang Yang, Tian Yun, Pinyuan Feng, Jinfa Huang, Ruixiang Tang

Abstract

Multimodal in-context learning (ICL) has emerged as a key mechanism for harnessing the capabilities of large vision–language models (LVLMs). However, its effectiveness remains highly sensitive to the quality of input ICL sequences, particularly for tasks involving complex reasoning or open-ended generation. A major limitation is our limited understanding of how LVLMs actually exploit these sequences during inference. To bridge this gap, we systematically interpret multimodal ICL through the lens of task mapping, which reveals how local and global relationships within and among demonstrations guide model reasoning. Building on this insight, we present TACO, a lightweight transformer-based model equipped with task-aware attention that dynamically configures ICL sequences. By injecting task-mapping signals into the autoregressive decoding process, TACO creates a bidirectional synergy between sequence construction and task reasoning. Experiments on five LVLMs and nine datasets demonstrate that TACO consistently surpasses baselines across diverse ICL tasks. These results position task mapping as a novel and valuable perspective for interpreting and improving multimodal ICL.

Anthology ID:: 2025.emnlp-main.39
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 736–763
Language:
URL:: https://aclanthology.org/2025.emnlp-main.39/
DOI:: 10.18653/v1/2025.emnlp-main.39
Bibkey:
Cite (ACL):: Yanshu Li, Jianjiang Yang, Tian Yun, Pinyuan Feng, Jinfa Huang, and Ruixiang Tang. 2025. TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 736–763, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration (Li et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.39.pdf
Checklist:: 2025.emnlp-main.39.checklist.pdf

PDF Cite Search Checklist Fix data