Testing Large Language Models on Compositionality and Inference with Phrase-Level Adjective-Noun Entailment

Lorenzo Bertolini, Julie Weeds, David Weir


Abstract
Previous work has demonstrated that pre-trained large language models (LLM) acquire knowledge during pre-training which enables reasoning over relationships between words (e.g, hyponymy) and more complex inferences over larger units of meaning such as sentences. Here, we investigate whether lexical entailment (LE, i.e. hyponymy or the is a relation between words) can be generalised in a compositional manner. Accordingly, we introduce PLANE (Phrase-Level Adjective-Noun Entailment), a new benchmark to test models on fine-grained compositional entailment using adjective-noun phrases. Our experiments show that knowledge extracted via In–Context and transfer learning is not enough to solve PLANE. However, a LLM trained on PLANE can generalise well to out–of–distribution sets, since the required knowledge can be stored in the representations of subwords (SW) tokens.
Anthology ID:
2022.coling-1.359
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4084–4100
Language:
URL:
https://aclanthology.org/2022.coling-1.359
DOI:
Bibkey:
Cite (ACL):
Lorenzo Bertolini, Julie Weeds, and David Weir. 2022. Testing Large Language Models on Compositionality and Inference with Phrase-Level Adjective-Noun Entailment. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4084–4100, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Testing Large Language Models on Compositionality and Inference with Phrase-Level Adjective-Noun Entailment (Bertolini et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.359.pdf
Code
 lorenzoscottb/plane