Lorenzo Gregori

2025

Evaluating Models, Prompting Strategies, and Task Formats: A Case Study on the MACID Challenge
Matteo Rinaldi | Rossella Varvara | Lorenzo Gregori | Andrea Amelio Ravelli
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

2024

pdf bib abs

IMPAQTS: a multimodal corpus of parliamentary and other political speeches in Italy (1946-2023), annotated with implicit strategies
Federica Cominetti | Lorenzo Gregori | Edoardo Lombardi Vallauri | Alessandro Panunzi
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024

The paper introduces the IMPAQTS corpus of Italian political discourse, a multimodal corpus of around 2.65 million tokens including 1,500 speeches uttered by 150 prominent politicians spanning from 1946 to 2023. Covering the entire history of the Italian Republic, the collection exhibits a non-homogeneous consistency that progressively increases in quantity towards the present. The corpus is balanced according to textual and socio-linguistic criteria and includes different types of speeches. The sociolinguistic features of the speakers are carefully considered to ensure representation of Republican Italian politicians. For each speaker, the corpus contains 4 parliamentary speeches, 2 rallies, 1 party assembly, and 3 statements (in person or broadcasted). Parliamentary speeches therefore constitute the largest section of the corpus (40% of the total), enabling direct comparison with other types of political speeches. The collection procedure, including details relevant to the transcription protocols, and the processing pipeline are described. The corpus has been pragmatically annotated to include information about the implicitly conveyed questionable contents, paired with their explicit paraphrasis, providing the largest Italian collection of ecologic examples of linguistic implicit strategies. The adopted ontology of linguistic implicitness and the fine-grained annotation scheme are presented in detail.

pdf bib abs

MACID - Multimodal ACtion IDentification: A CALAMITA Challenge
Andrea Amelio Ravelli | Rossella Varvara | Lorenzo Gregori
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

This paper presents the Multimodal ACtion IDentification challenge (MACID), part of the first CALAMITA competition. The objective of this task is to evaluate the ability of large language models (LLMs) to differentiate between closely related action concepts based on textual descriptions alone. The challenge is inspired by the “find the intruder” task, where models must identify an outlier among a set of 4 sentences that describe similar yet distinct actions. The dataset highlights action-predicate mismatches, where the same verb may describe different actions or different verbs may refer to the same action. Although currently mono-modal (text-only), the task is designed for future multimodal integration, linking visual and textual representations to enhance action recognition. By probing a model’s capacity to resolve subtle linguistic ambiguities, the challenge underscores the need for deeper cognitive understanding in action-language alignment, ultimately testing the boundaries of LLMs’ ability to interpret action verbs and their associated concepts.

2023

pdf bib

Identification of Multiword Expressions: Comparing the Performance of a Conditional Random Fields Model on Corpora of Written and Spoken Italian
Ilaria Manfredi | Lorenzo Gregori
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

2020

pdf bib

L’impatto emotivo della comunicazione istituzionale durante la pandemia di COVID-19: uno studio di Twitter Sentiment Analysis
Gloria Gagliardi | Lorenzo Gregori | Alice Suozzi
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

2018

pdf bib

One event, many representations. Mapping action concepts through visual features.
Alessandro Panunzi | Lorenzo Gregori | Andrea Amelio Ravelli
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs

Measuring the Italian-English lexical gap for action verbs and its impact on translation
Lorenzo Gregori | Alessandro Panunzi
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications

This paper describes a method to measure the lexical gap of action verbs in Italian and English by using the IMAGACT ontology of action. The fine-grained categorization of action concepts of the data source allowed to have wide overview of the relation between concepts in the two languages. The calculated lexical gap for both English and Italian is about 30% of the action concepts, much higher than previous results. Beyond this general numbers a deeper analysis has been performed in order to evaluate the impact that lexical gaps can have on translation. In particular a distinction has been made between the cases in which the presence of a lexical gap affects translation correctness and completeness at a semantic level. The results highlight a high percentage of concepts that can be considered hard to translate (about 18% from English to Italian and 20% from Italian to English) and confirms that action verbs are a critical lexical class for translation tasks.

pdf bib

Evaluating a Rule Based Strategy to Map IMAGACT and T-PAS
Andrea Amelio Ravelli | Lorenzo Gregori | Anna Feltracco
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

2012

pdf bib abs

RIDIRE-CPI: an Open Source Crawling and Processing Infrastructure for Supervised Web-Corpora Building
Alessandro Panunzi | Marco Fabbri | Massimo Moneglia | Lorenzo Gregori | Samuele Paladini
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper introduces the RIDIRE-CPI, an open source tool for the building of web corpora with a specific design through a targeted crawling strategy. The tool has been developed within the RIDIRE Project, which aims at creating a 2 billion word balanced web corpus for Italian. RIDIRE-CPI architecture integrates existing open source tools as well as modules developed specifically within the RIDIRE project. It consists of various components: a robust crawler (Heritrix), a user friendly web interface, several conversion and cleaning tools, an anti-duplicate filter, a language guesser, and a PoS tagger. The RIDIRE-CPI user-friendly interface is specifically intended for allowing collaborative work performance by users with low skills in web technology and text processing. Moreover, RIDIRE-CPI integrates a validation interface dedicated to the evaluation of the targeted crawling. Through the content selection, metadata assignment, and validation procedures, the RIDIRE-CPI allows the gathering of linguistic data with a supervised strategy that leads to a higher level of control of the corpus contents. The modular architecture of the infrastructure and its open-source distribution will assure the reusability of the tool for other corpus building initiatives.

Co-authors

Edoardo Lombardi Vallauri 1

Venues

Fix author