Are representations built from the ground up? An empirical examination of local composition in language models

Emmy Liu, Graham Neubig


Abstract
Compositionality, the phenomenon where the meaning of a phrase can be derived from its constituent parts, is a hallmark of human language. At the same time, many phrases are non-compositional, carrying a meaning beyond that of each part in isolation. Representing both of these types of phrases is critical for language understanding, but it is an open question whether modern language models (LMs) learn to do so; in this work we examine this question. We first formulate a problem of predicting the LM-internal representations of longer phrases given those of their constituents. We find that the representation of a parent phrase can be predicted with some accuracy given an affine transformation of its children. While we would expect the predictive accuracy to correlate with human judgments of semantic compositionality, we find this is largely not the case, indicating that LMs may not accurately distinguish between compositional and non-compositional phrases. We perform a variety of analyses, shedding light on when different varieties of LMs do and do not generate compositional representations, and discuss implications for future modeling work.
Anthology ID:
2022.emnlp-main.617
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9053–9073
Language:
URL:
https://aclanthology.org/2022.emnlp-main.617
DOI:
10.18653/v1/2022.emnlp-main.617
Bibkey:
Cite (ACL):
Emmy Liu and Graham Neubig. 2022. Are representations built from the ground up? An empirical examination of local composition in language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9053–9073, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Are representations built from the ground up? An empirical examination of local composition in language models (Liu & Neubig, EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.617.pdf