On the Additive Compositionality of Task Vectors in Vision–Language Models

Yuting Shi; Houjing Wei; Naoya Inoue

On the Additive Compositionality of Task Vectors in Vision–Language Models

Abstract

In-context learning (ICL) in large language models (LLMs) has been shown to operate through task vectors—the representation that summarizes the mapping induced by in-context demonstrations and can be composed by simple arithmetic operations. While this phenomenon is well studied in LLMs, its extension to vision-language models (VLMs) remains underexplored. In this work, we systematically examine the additive compositionality of in-context task vectors in VLMs, extracted from text-side hidden representations. Specifically, we construct compositional visual reasoning tasks with clearly defined subtasks and extract task vectors from few-shot demonstrations. Empirical experiments show that the vector for a complex task can be approximated by adding the vectors of its constituent subtasks. Beyond this, we analyze token-level contextual embeddings and show that additive composition arises because complex-task representations emerge as the superposition of atomic subtask components, preserving semantic structure within the model’s activation space.

Anthology ID:: 2026.eacl-short.38
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 513–521
Language:
URL:: https://aclanthology.org/2026.eacl-short.38/
DOI:
Bibkey:
Cite (ACL):: Yuting Shi, Houjing Wei, and Naoya Inoue. 2026. On the Additive Compositionality of Task Vectors in Vision–Language Models. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 513–521, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: On the Additive Compositionality of Task Vectors in Vision–Language Models (Shi et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-short.38.pdf
Checklist:: 2026.eacl-short.38.checklist.pdf

PDF Cite Search Checklist Fix data