Mario Mezzanzanica
2026
SkiLLens: Recognising and Mapping Novel Skills from Millions of Job Ads Across Europe Using Language Models
Alessia De Santo | Lorenzo Malandri | Fabio Mercorio | Mario Mezzanzanica | Navid Nobani
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Alessia De Santo | Lorenzo Malandri | Fabio Mercorio | Mario Mezzanzanica | Navid Nobani
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
In a rapidly evolving labor market, detecting and addressing emerging skill needs is essential for shaping responsive education and workforce policies. Online job advertisements (OJAs) provide a real-time view of changing demands, but require first retrieving skill mentions from unstructured text and then solving the entity linking problem of connecting them to standardized skill taxonomies. To harness this potential, we present a multilingual human-in-the-loop (HITL) pipeline that operates in two steps: candidate skills are extracted from national OJA corpora using country-specific word embeddings, capturing terms that reflect each country’s labor market. These candidates are linked to ESCO using an encoder-based system and refined through a decoder large language models (LLMs) for accurate contextual alignment. Our approach is validated through both quantitative and qualitative evaluations, demonstrating that our method enables timely, multilingual monitoring of emerging skills, supporting agile policy-making and targeted training initiatives.
2025
RE-FIN: Retrieval-based Enrichment for Financial data
Lorenzo Malandri | Fabio Mercorio | Mario Mezzanzanica | Filippo Pallucchini
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Lorenzo Malandri | Fabio Mercorio | Mario Mezzanzanica | Filippo Pallucchini
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Enriching sentences with knowledge from qualitative sources benefits various NLP tasks and enhances the use of labeled data in model training. This is crucial for Financial Sentiment Analysis (FSA), where texts are often brief and contain implied information. We introduce RE-FIN (Retrieval-based Enrichment for FINancial data), an automated system designed to retrieve information from a knowledge base to enrich financial sentences, making them more knowledge-dense and explicit. RE-FIN generates propositions from the knowledge base and employs Retrieval-Augmented Generation (RAG) to augment the original text with relevant information. A large language model (LLM) rewrites the original sentence, incorporating this data. Since the LLM does not create new content, the risk of hallucinations is significantly reduced. The LLM generates multiple new sentences using different relevant information from the knowledge base; we developed an algorithm to select one that best preserves the meaning of the original sentence while avoiding excessive syntactic similarity. Results show that enhanced sentences present lower perplexity than the original ones and improve performances on FSA.
ITALIC: An Italian Culture-Aware Natural Language Benchmark
Andrea Seveso | Daniele Potertì | Edoardo Federici | Mario Mezzanzanica | Fabio Mercorio
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Andrea Seveso | Daniele Potertì | Edoardo Federici | Mario Mezzanzanica | Fabio Mercorio
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
We present ITALIC, a large-scale benchmark dataset of 10,000 multiple-choice questions designed to evaluate the natural language understanding of the Italian language and culture. ITALIC spans 12 domains, exploiting public tests to score domain experts in real-world scenarios. We detail our data collection process, stratification techniques, and selection strategies. ITALIC provides a comprehensive assessment suite that captures commonsense reasoning and linguistic proficiency in a morphologically rich language. We establish baseline performances using 17 state-of-the-art LLMs, revealing current limitations in Italian language understanding and highlighting significant linguistic complexity and cultural specificity challenges. ITALIC serves as a benchmark for evaluating existing models and as a roadmap for future research, encouraging the development of more sophisticated and culturally aware natural language systems.
2022
Contrastive Explanations of Text Classifiers as a Service
Lorenzo Malandri | Fabio Mercorio | Mario Mezzanzanica | Navid Nobani | Andrea Seveso
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations
Lorenzo Malandri | Fabio Mercorio | Mario Mezzanzanica | Navid Nobani | Andrea Seveso
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations
The recent growth of black-box machine-learning methods in data analysis has increased the demand for explanation methods and tools to understand their behaviour and assist human-ML model cooperation. In this paper, we demonstrate ContrXT, a novel approach that uses natural language explanations to help users to comprehend how a back-box model works. ContrXT provides time contrastive (t-contrast) explanations by computing the differences in the classification logic of two different trained models and then reasoning on their symbolic representations through Binary Decision Diagrams. ContrXT is publicly available at ContrXT.ai as a python pip package.