Fabio Mercorio


2025

pdf bib
RE-FIN: Retrieval-based Enrichment for Financial data
Lorenzo Malandri | Fabio Mercorio | Mario Mezzanzanica | Filippo Pallucchini
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track

Enriching sentences with knowledge from qualitative sources benefits various NLP tasks and enhances the use of labeled data in model training. This is crucial for Financial Sentiment Analysis (FSA), where texts are often brief and contain implied information. We introduce RE-FIN (Retrieval-based Enrichment for FINancial data), an automated system designed to retrieve information from a knowledge base to enrich financial sentences, making them more knowledge-dense and explicit. RE-FIN generates propositions from the knowledge base and employs Retrieval-Augmented Generation (RAG) to augment the original text with relevant information. A large language model (LLM) rewrites the original sentence, incorporating this data. Since the LLM does not create new content, the risk of hallucinations is significantly reduced. The LLM generates multiple new sentences using different relevant information from the knowledge base; we developed an algorithm to select one that best preserves the meaning of the original sentence while avoiding excessive syntactic similarity. Results show that enhanced sentences present lower perplexity than the original ones and improve performances on FSA.

2024

pdf bib
BEEP - BEst DrivEr’s License Performer: A CALAMITA Challenge
Fabio Mercorio | Daniele Potertì | Antonio Serino | Andrea Seveso
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

We present BEEP (BEst DrivEr’s License Performer), a benchmark challenge to evaluate large language models in the context of a simulated Italian driver’s license exam. This challenge tests the models’ ability to understand and apply traffic laws, road safety regulations, and vehicle-related knowledge through a series of true/false questions. The dataset is derived from official ministerial materials used in the Italian licensing process, specifically targeting Category B licenses.We evaluate models such as LLaMA and Mixtral across multiple categories. In addition, we simulate a driving license test to assess the models’ real-world applicability, where the pass rate is determined based on the number of errors allowed. While scaling up model size improved performance, even larger models struggled to pass the exam consistently. The challenge demonstrates the capabilities and limitations of LLMs in handling real-world, high-stakes scenarios, providing insights into their practical use and areas for further improvement.

2022

pdf bib
Contrastive Explanations of Text Classifiers as a Service
Lorenzo Malandri | Fabio Mercorio | Mario Mezzanzanica | Navid Nobani | Andrea Seveso
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations

The recent growth of black-box machine-learning methods in data analysis has increased the demand for explanation methods and tools to understand their behaviour and assist human-ML model cooperation. In this paper, we demonstrate ContrXT, a novel approach that uses natural language explanations to help users to comprehend how a back-box model works. ContrXT provides time contrastive (t-contrast) explanations by computing the differences in the classification logic of two different trained models and then reasoning on their symbolic representations through Binary Decision Diagrams. ContrXT is publicly available at ContrXT.ai as a python pip package.