Distinguishing fair from unfair compositional generalization tasks

Ahmad Jabbar, Cleo Condoravdi, Christopher Potts


Abstract
Compositional generalization benchmarks seek to assess whether learning agents can successfully combine familiar concepts in novel ways. COGS (Kim & Linzen 2020, COGS, EMNLP) provides a suite of such tasks in the area of interpretive semantics (mapping sentences to logical forms). A noteworthy finding for COGS is that model performance varies widely across tasks. In this paper, we argue that these performance differences reflect deep properties of these tasks. We focus on two COGS tasks: an easy task (models are generally successful) and a hard task (no present-day models get any traction). Using both experiments and conceptual analysis, we argue that the easy task requires only a single distributional generalization that is well-supported by the training data, whereas the hard task involves a learning target that is ambiguous or even contradicted by the training data. We additionally argue that pretraining can disambiguate the hard task without compromising the goal of testing compositional generalization. Overall, our findings offer practical guidance to designers of compositional generalization benchmarks and also yield new insights into the nature of compositionality itself.
Anthology ID:
2025.findings-emnlp.1133
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20796–20807
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1133/
DOI:
Bibkey:
Cite (ACL):
Ahmad Jabbar, Cleo Condoravdi, and Christopher Potts. 2025. Distinguishing fair from unfair compositional generalization tasks. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 20796–20807, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Distinguishing fair from unfair compositional generalization tasks (Jabbar et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1133.pdf
Checklist:
 2025.findings-emnlp.1133.checklist.pdf