Yuri Noviello


2024

pdf bib
Exploring Text-Embedding Retrieval Models for the Italian Language
Yuri Noviello | Fabio Tamburini
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

Text retrieval systems have become essential in the field of natural language processing (NLP), serving as the backbone for applications such as search engines, document indexing, and information retrieval. With the rise of generative AI, particularly Retrieval-Augmented Generation (RAG) systems, the demand for robust text retrieval models has increased. However, existing large language models (LLMs) and datasets are often insufficiently optimized for Italian, limiting their performance in Italian text retrieval tasks. This paper addresses this gap by proposing both a data collection and specialized models tailored for Italian text retrieval. Through extensive experimentation, we analyze the improvements and limitations in retrieval performance, paving the way for more effective Italian NLP applications.

2023

pdf bib
TeamUnibo at SemEval-2023 Task 6: A transformer based approach to Rhetorical Roles prediction and NER in Legal Texts
Yuri Noviello | Enrico Pallotta | Flavio Pinzarrone | Giuseppe Tanzi
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This study aims to tackle some challenges posed by legal texts in the field of NLP. The LegalEval challenge proposes three tasks, based on Indial Legal documents: Rhetorical Roles Prediction, Legal Named Entity Recognition, and Court Judgement Prediction with Explanation. Our work focuses on the first two tasks. For the first task we present a context-aware approach to enhance sentence information. With the help of this approach, the classification model utilizing InLegalBert as a transformer achieved 81.12% Micro-F1. For the second task we present a NER approach to extract and classify entities like names of petitioner, respondent, court or statute of a given document. The model utilizing XLNet as transformer and a dependency parser on top achieved 87.43% Macro-F1.