Do Language Models discriminate between relatives and pseudorelatives?

Adele Henot-Mortier


Abstract
Large Language Models (LLMs) are often evaluated against massive benchmarks based on general-purpose tasks, which, despite being useful for concrete applications, tell us very little about the capacity of LLMs to learn specific and challenging aspects of the grammar. Here, we evaluate whether LLMs learn to identify a particular structure attested in Romance (and French in particular), called the pseudorelative. This structure, which is often surface-similar to a relative clause, is linked to robust syntactic and semantic restrictions. We present a series of experiments to test if LLMs pretrained on massive yet general corpora, manage to learn those various restrictions. Our results suggest that LLMs learn some but not all of these properties, but crucially fail at recognizing the most specific of them: cliticization.
Anthology ID:
2023.clasp-1.6
Volume:
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
Month:
September
Year:
2023
Address:
Gothenburg, Sweden
Editors:
Ellen Breitholtz, Shalom Lappin, Sharid Loaiciga, Nikolai Ilinykh, Simon Dobnik
Venue:
CLASP
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
55–61
Language:
URL:
https://aclanthology.org/2023.clasp-1.6
DOI:
Bibkey:
Cite (ACL):
Adele Henot-Mortier. 2023. Do Language Models discriminate between relatives and pseudorelatives?. In Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), pages 55–61, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):
Do Language Models discriminate between relatives and pseudorelatives? (Henot-Mortier, CLASP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.clasp-1.6.pdf