Marco Rovera

2025

Benchmarking Historical Phase Recognition from Text and Events
Fabio Celli | Marco Rovera
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

pdf bib abs

ModaFact: Multi-paradigm Evaluation for Joint Event Modality and Factuality Detection
Marco Rovera | Serena Cristoforetti | Sara Tonelli
Proceedings of the 31st International Conference on Computational Linguistics

Factuality and modality are two crucial aspects concerning events, since they convey the speaker’s commitment to a situation in discourse as well as how this event is supposed to occur in terms of norms, wishes, necessity, duty and so on. Capturing them both is necessary to truly understand an utterance meaning and the speaker’s perspective with respect to a mentioned event. Yet, NLP studies have mostly dealt with these two aspects separately, mainly devoting past efforts to the development of English datasets. In this work, we propose ModaFact, a novel resource with joint factuality and modality information for event-denoting expressions in Italian. We propose a novel annotation scheme, which however is consistent with existing ones, and compare different classification systems trained on ModaFact, as a preliminary step to the use of factuality and modality information in downstream tasks. The dataset and the best-performing model are publicly released and available under an open license.

pdf bib abs

Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches
Alan Ramponi | Marco Rovera | Robert Moro | Sara Tonelli
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Retrieval of previously fact-checked claims is a well-established task, whose automation can assist professional fact-checkers in the initial steps of information verification. Previous works have mostly tackled the task monolingually, i.e., having both the input and the retrieved claims in the same language. However, especially for languages with a limited availability of fact-checks and in case of global narratives, such as pandemics, wars, or international politics, it is crucial to be able to retrieve claims across languages. In this work, we examine strategies to improve the multilingual and crosslingual performance, namely selection of negative examples (in the supervised) and re-ranking (in the unsupervised setting). We evaluate all approaches on a dataset containing posts and claims in 47 languages (283 language combinations). We observe that the best results are obtained by using LLM-based re-ranking, followed by fine-tuning with negative examples sampled using a sentence similarity-based strategy. Most importantly, we show that crosslinguality is a setup with its own unique characteristics compared to the multilingual setup.

2024

pdf bib abs

EventNet-ITA: Italian Frame Parsing for Events
Marco Rovera
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)

This paper introduces EventNet-ITA, a large, multi-domain corpus annotated full-text with event frames for Italian. Moreover, we present and thoroughly evaluate an efficient multi-label sequence labeling approach for Frame Parsing. Covering a wide range of individual, social and historical phenomena, with more than 53,000 annotated sentences and over 200 modeled frames, EventNet-ITA constitutes the first systematic attempt to provide the Italian language with a publicly available resource for Frame Parsing of events, useful for a broad spectrum of research and application tasks. Our approach achieves a promising 0.9 strict F1-score for frame classification and 0.72 for frame element classification, on top of minimizing computational requirements. The annotated corpus and the frame parsing model are released under open license.

pdf bib abs

TimeFrame: Querying and Visualizing Event Semantic Frames in Time
Davide Lamorte | Marco Rovera | Alfio Ferrara | Sara Tonelli
Proceedings of the First Workshop on Reference, Framing, and Perspective @ LREC-COLING 2024

In this work we introduce TimeFrame, an online platform to easily query and visualize events and participants extracted from document collections in Italian following a frame-based approach. The system allows users to select one or more events (frames) or event categories and to display their occurrences on a timeline. Different query types, from coarse to fine-grained, are available through the interface, enabling a time-bound analysis of large historical corpora. We present three use cases based on the full archive of news published in 1948 by the newspaper “Corriere della Sera”. We show that different crucial events can be explored, providing interesting insights into the narratives around such events, the main participants and their points of view.

2023

pdf bib abs

This work introduces a novel, extensive annotated corpus for multi-label legislative text classification in Italian, based on legal acts from the Gazzetta Ufficiale, the official source of legislative information of the Italian state. The annotated dataset, which we released to the community, comprises over 363,000 titles of legislative acts, spanning over 30 years from 1988 until 2022. Moreover, we evaluate four models for text classification on the dataset, demonstrating how using only the acts’ titles can achieve top-level classification performance, with a micro F1-score of 0.87. Also, our analysis shows how Italian domain-adapted legal models do not outperform general-purpose models on the task. Models’ performance can be checked by users via a demonstrator system provided in support of this work.

Marco Rovera

2025

2024

2023

2017

Co-authors

Venues