BLM-s/lE: A structured dataset of English spray-load verb alternations for testing generalization in LLMs

Giuseppe Samo, Vivi Nastase, Chunyang Jiang, Paola Merlo


Abstract
Current NLP models appear to be achieving performance comparable to human capabilities on well-established benchmarks. New benchmarks are now necessary to test deeper layers of understanding of natural languages by these models. Blackbird’s Language Matrices are a recently developed framework that draws inspiration from tests of human analytic intelligence. The BLM task has revealed that successful performances in previously studied linguistic problems do not yet stem from a deep understanding of the generative factors that define these problems. In this study, we define a new BLM task for predicate-argument structure, and develop a structured dataset for its investigation, concentrating on the spray-load verb alternations in English, as a case study. The context sentences include one alternant from the spray-load alternation and the target sentence is the other alternant, to be chosen among a minimally contrastive and adversarial set of answers. We describe the generation process of the dataset and the reasoning behind the generating rules. The dataset aims to facilitate investigations into how verb information is encoded in sentence embeddings and how models generalize to the complex properties of argument structures. Benchmarking experiments conducted on the dataset and qualitative error analysis on the answer set reveal the inherent challenges associated with the problem even for current high-performing representations.
Anthology ID:
2023.findings-emnlp.821
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12276–12287
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.821
DOI:
10.18653/v1/2023.findings-emnlp.821
Bibkey:
Cite (ACL):
Giuseppe Samo, Vivi Nastase, Chunyang Jiang, and Paola Merlo. 2023. BLM-s/lE: A structured dataset of English spray-load verb alternations for testing generalization in LLMs. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12276–12287, Singapore. Association for Computational Linguistics.
Cite (Informal):
BLM-s/lE: A structured dataset of English spray-load verb alternations for testing generalization in LLMs (Samo et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.821.pdf