Alessio Bosca

2025

BitsAndBites at SemEval-2025 Task 9: Improving Food Hazard Detection with Sequential Multitask Learning and Large Language Models
Aurora Gensale | Irene Benedetto | Luca Gioacchini | Luca Cagliero | Alessio Bosca
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Automatic and early detection of foodborne hazards is crucial for preventing outbreaks. Existing AI-based solutions often struggle with the complexity and noise of food recall reports and overcome the dependency between product and hazard labels. We introduce a methodology to classify reports on food-related incidents to address these challenges. Our approach leverages LLM-based information extraction to minimize report variability, alongside a two-stage classification pipeline. The first model assigns coarse-grained labels, narrowing the space of eligible fine-grained labels for the second model. This sequential process allows us to capture hierarchical label dependencies between products and hazards and their respective categories. Additionally, we design each model with two classification heads relying on the inherent relations between food products and associated hazards. We validate our approach on two multi-label classification sub-tasks. Experimental results demonstrate the effectiveness of our approach, achieving an improvement of +30% and +40% in classification performance compared to the baseline.

2023

pdf bib abs

UINAUIL: A Unified Benchmark for Italian Natural Language Understanding
Valerio Basile | Livio Bioglio | Alessio Bosca | Cristina Bosco | Viviana Patti
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

This paper introduces the Unified Interactive Natural Understanding of the Italian Language (UINAUIL), a benchmark of six tasks for Italian Natural Language Understanding. We present a description of the tasks and software library that collects the data from the European Language Grid, harmonizes the data format, and exposes functionalities to facilitates data manipulation and the evaluation of custom models. We also present the results of tests conducted with available Italian and multilingual language models on UINAUIL, providing an updated picture of the current state of the art in Italian NLU.

2020

pdf bib

The “Corpus Anchise 320” and the Analysis of Conversations between Healthcare Workers and People with Dementia
Nicola Benvenuti | Andrea Bolioli | Alessandro Mazzei | Pietro Vigorelli | Alessio Bosca
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

2014

pdf bib

A Lightweight Terminology Verification Service for External Machine Translation Engines
Alessio Bosca | Vassilina Nikoulina | Marc Dymetman
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib abs

Modeling, Managing, Exposing, and Linking Ontologies with a Wiki-based Tool
Mauro Dragoni | Alessio Bosca | Matteo Casu | Andi Rexha
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In the last decade, the need of having effective and useful tools for the creation and the management of linguistic resources significantly increased. One of the main reasons is the necessity of building linguistic resources (LRs) that, besides the goal of expressing effectively the domain that users want to model, may be exploited in several ways. In this paper we present a wiki-based collaborative tool for modeling ontologies, and more in general any kind of linguistic resources, called MoKi. This tool has been customized in the context of an EU-funded project for addressing three important aspects of LRs modeling: (i) the exposure of the created LRs, (ii) for providing features for linking the created resources to external ones, and (iii) for producing multilingual LRs in a safe manner.

pdf bib abs

A Gold Standard for CLIR evaluation in the Organic Agriculture Domain
Alessio Bosca | Matteo Casu | Matteo Dragoni | Nikolaos Marianos
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a gold standard for the evaluation of Cross Language Information Retrieval systems in the domain of Organic Agriculture and AgroEcology. The presented resource is free to use for research purposes and it includes a collection of multilingual documents annotated with respect to a domain ontology, the ontology used for annotating the resources, a set of 48 queries in 12 languages and a gold standard with the correct resources for the proposed queries. The goal of this work consists in contributing to the research community with a resource for evaluating multilingual retrieval algorithms, with particular focus on domain adaptation strategies for general purpose multilingual information retrieval systems and on the effective exploitation of semantic annotations. Domain adaptation is in fact an important activity for tuning the retrieval system, reducing the ambiguities and improving the precision of information retrieval. Domain ontologies constitute a diffuse practice for defining the conceptual space of a corpus and mapping resources to specific topics and in our lab we propose as well to investigate and evaluate the impact of this information in enhancing the retrieval of contents. An initial experiment is described, giving a baseline for further research with the proposed gold standard.

2013

pdf bib

Celi: EDITS and Generic Text Pair Classification
Milen Kouylekov | Luca Dini | Alessio Bosca | Marco Trevisan
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib abs

Linguagrid: a network of Linguistic and Semantic Services for the Italian Language.
Alessio Bosca | Luca Dini | Milen Kouylekov | Marco Trevisan
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In order to handle the increasing amount of textual information today available on the web and exploit the knowledge latent in this mass of unstructured data, a wide variety of linguistic knowledge and resources (Language Identification, Morphological Analysis, Entity Extraction, etc.). is crucial. In the last decade LRaas (Language Resource as a Service) emerged as a novel paradigm for publishing and sharing these heterogeneous software resources over the Web. In this paper we present an overview of Linguagrid, a recent initiative that implements an open network of linguistic and semantic Web Services for the Italian language, as well as a new approach for enabling customizable corpus-based linguistic services on Linguagrid LRaaS infrastructure. A corpus ingestion service in fact allows users to upload corpora of documents and to generate classification/clustering models tailored to their needs by means of standard machine learning techniques applied to the textual contents and metadata from the corpora. The models so generated can then be accessed through proper Web Services and exploited to process and classify new textual contents.

pdf bib

CELI: An Experiment with Cross Language Textual Entailment
Milen Kouylekov | Luca Dini | Alessio Bosca | Marco Trevisan
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)