What Do Language Models Learn in Context? The Structured Task Hypothesis.

Jiaoda Li, Yifan Hou, Mrinmaya Sachan, Ryan Cotterell


Abstract
Large language models (LLMs) exhibit an intriguing ability to learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL). Understandably, a swath of research has been dedicated to uncovering the theories underpinning ICL. One popular hypothesis explains ICL by task selection. LLMs identify the task based on the demonstration and generalize it to the prompt. Another popular hypothesis is that ICL is a form of meta-learning, i.e., the models learn a learning algorithm at pre-training time and apply it to the demonstration. Finally, a third hypothesis argues that LLMs use the demonstration to select a composition of tasks learned during pre-training to perform ICL. In this paper, we empirically explore these three hypotheses that explain LLMs’ ability to learn in context with a suite of experiments derived from common text classification tasks. We invalidate the first two hypotheses with counterexamples and provide evidence in support of the last hypothesis. Our results suggest an LLM could learn a novel task in context via composing tasks learned during pre-training.
Anthology ID:
2024.acl-long.669
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12365–12379
Language:
URL:
https://aclanthology.org/2024.acl-long.669
DOI:
Bibkey:
Cite (ACL):
Jiaoda Li, Yifan Hou, Mrinmaya Sachan, and Ryan Cotterell. 2024. What Do Language Models Learn in Context? The Structured Task Hypothesis.. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12365–12379, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
What Do Language Models Learn in Context? The Structured Task Hypothesis. (Li et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.669.pdf