Giuseppe Samo


2023

pdf bib
Blackbird Language Matrices Tasks for Generalization
Paola Merlo | Chunyang Jiang | Giuseppe Samo | Vivi Nastase
Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP

To develop a system with near-human language capabilities, we need to understand current systems’ generalisation and compositional abilities. We approach this by generating compositional, structured data, inspired from visual intelligence tests, that depend on the problem-solvers being able to disentangle objects and their absolute and relative properties in a sequence of images. We design an analogous task and develop the corresponding datasets that capture specific linguistic phenomena and their properties. Solving each problem instance depends on detecting the relevant linguistic objects and generative rules of the problem. We propose two datasets modelling two linguistic phenomena – subject-verb agreement in French, and verb alternations in English. The datasets can be used to investigate how LLMs encode linguistic objects, such as phrases, their grammatical and semantic properties, such as number or semantic role, and how such information is combined to correctly solve each problem. Specifically generated error types help investigate the behaviour of the system, which important information it is able to detect, and which structures mislead it.

pdf bib
BLM-s/lE: A structured dataset of English spray-load verb alternations for testing generalization in LLMs
Giuseppe Samo | Vivi Nastase | Chunyang Jiang | Paola Merlo
Findings of the Association for Computational Linguistics: EMNLP 2023

Current NLP models appear to be achieving performance comparable to human capabilities on well-established benchmarks. New benchmarks are now necessary to test deeper layers of understanding of natural languages by these models. Blackbird’s Language Matrices are a recently developed framework that draws inspiration from tests of human analytic intelligence. The BLM task has revealed that successful performances in previously studied linguistic problems do not yet stem from a deep understanding of the generative factors that define these problems. In this study, we define a new BLM task for predicate-argument structure, and develop a structured dataset for its investigation, concentrating on the spray-load verb alternations in English, as a case study. The context sentences include one alternant from the spray-load alternation and the target sentence is the other alternant, to be chosen among a minimally contrastive and adversarial set of answers. We describe the generation process of the dataset and the reasoning behind the generating rules. The dataset aims to facilitate investigations into how verb information is encoded in sentence embeddings and how models generalize to the complex properties of argument structures. Benchmarking experiments conducted on the dataset and qualitative error analysis on the answer set reveal the inherent challenges associated with the problem even for current high-performing representations.

2019

pdf bib
Intervention effects in object relatives in English and Italian: a study in quantitative computational syntax
Giuseppe Samo | Paola Merlo
Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019)