Alessandro Mazzei - ACL Anthology

Alessandro Mazzei

Also published as: A Mazzei

2025

When Figures Speak with Irony: Investigating the Role of Rhetorical Figures in Irony Generation with LLMs
Pier Felice Balestrucci | Michael Oliverio | Soda Marem Lo | Luca Anselma | Valerio Basile | Alessandro Mazzei | Viviana Patti
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

Exploiting Task Reversibility of DRS Parsing and Generation: Challenges and Insights from a Multi-lingual Perspective
Muhammad Saad Amin | Luca Anselma | Alessandro Mazzei
Proceedings of the First Workshop on Language Models for Low-Resource Languages

Semantic parsing and text generation exhibit reversible properties when utilizing Discourse Representation Structures (DRS). However, both processes—text-to-DRS parsing and DRS-to-text generation—are susceptible to errors. In this paper, we exploit the reversible nature of DRS to explore both error propagation, which is commonly seen in pipeline methods, and the less frequently studied potential for error correction. We investigate two pipeline approaches: Parse-Generate-Parse (PGP) and Generate-Parse-Generate (GPG), utilizing pre-trained language models where the output of one model becomes the input for the next. Our evaluation uses the Parallel Meaning Bank dataset, focusing on Urdu as a low-resource language, Italian as a mid-resource language, and English serving as a high-resource baseline. Our analysis highlights that while pipelines are theoretically suited for error correction, they more often propagate errors, with Urdu exhibiting the greatest sensitivity, Italian showing a moderate effect, and English demonstrating the highest stability. This variation highlights the unique challenges faced by low-resource languages in semantic processing tasks. Further, our findings suggest that these pipeline methods support the development of more linguistically balanced datasets, enabling a comprehensive assessment across factors like sentence structure, length, type, polarity, and voice. Our cross-linguistic analysis provides valuable insights into the behavior of DRS processing in low-resource contexts, demonstrating both the potential and limitations of reversible pipeline approaches.

A Modular LLM-based Dialog System for Accessible Exploration of Finite State Automata
Stefano Vittorio Porta | Pier Felice Balestrucci | Michael Oliverio | Luca Anselma | Alessandro Mazzei
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

WebNLG-IT: Construction of an aligned RDF-Italian corpus through Machine Translation techniques
Michael Oliverio | Pier Felice Balestrucci | Alessandro Mazzei | Valerio Basile
Findings of the Association for Computational Linguistics: ACL 2025

The main goal of this work is the creation of the Italian version of the WebNLG corpus through the application of Neural Machine Translation (NMT) and post-editing with hand-written rules. To achieve this goal, in a first step, several existing NMT models were analysed and compared in order to identify the system with the highest performance on the original corpus. In a second step, after using the best NMT system, we semi-automatically designed and applied a number of rules to refine and improve the quality of the produced resource, creating a new corpus named WebNLG-IT. We used this resource for fine-tuning several LLMs for RDF-to-text tasks. In this way, comparing the performance of LLM-based generators on both Italian and English, we have (1) evaluated the quality of WebNLG-IT with respect to the original English version, (2) released the first fine-tuned LLM-based system for generating Italian from semantic web triples and (3) introduced an Italian version of a modular generation pipeline for RDF-to-text.

Evaluating Structural and Linguistic Quality in Urdu DRS Parsing and Generation through Bidirectional Evaluation
Muhammad Saad Amin | Luca Anselma | Alessandro Mazzei
Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages

Evaluating Discourse Representation Structure (DRS)-based systems for semantic parsing (Text-to-DRS) and generation (DRS-to-Text) poses unique challenges, particularly in low-resource languages like Urdu. Traditional metrics often fall short, focusing either on structural accuracy or linguistic quality, but rarely capturing both. To address this limitation, we introduce two complementary evaluation methodologies—Parse-Generate (PARS-GEN) and Generate-Parse (GEN-PARS)—designed for a more comprehensive assessment of DRS-based systems. PARS-GEN evaluates the parsing process by converting DRS outputs back to the text, revealing linguistic nuances often missed by structure-focused metrics like SMATCH. Conversely, GEN-PARS assesses text generation by converting generated text into DRS, providing a semantic perspective that complements surface-level metrics such as BLEU, METEOR, and BERTScore. Using the Parallel Meaning Bank (PMB) dataset, we demonstrate our methodology across Urdu, uncovering unique insights into Urdu’s structural and linguistic interplay. Findings show that traditional metrics frequently overlook the complexity of linguistic and semantic fidelity, especially in low-resource languages. Our dual approach offers a robust framework for evaluating DRS-based systems, enhancing semantic parsing and text generation quality.

Towards a Perspectivist Understanding of Irony through Rhetorical Figures
Pier Felice Balestrucci | Michael Oliverio | Elisa Chierchiello | Eliana Di Palma | Luca Anselma | Valerio Basile | Cristina Bosco | Alessandro Mazzei | Viviana Patti
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP

Irony is a subjective and pragmatically complex phenomenon, often conveyed through rhetorical figures and interpreted differently across individuals. In this study, we adopt a perspectivist approach, accounting for the socio-demographic background of annotators, to investigate whether specific rhetorical strategies promote a shared perception of irony within demographic groups, and whether Large Language Models (LLMs) reflect specific perspectives. Focusing on the Italian subset of the perspectivist MultiPICo dataset, we manually annotate rhetorical figures in ironic replies using a linguistically grounded taxonomy. The annotation is carried out by expert annotators balanced by generation and gender, enabling us to analyze inter-group agreement and polarization. Our results show that some rhetorical figures lead to higher levels of agreement, suggesting that certain rhetorical strategies are more effective in promoting a shared perception of irony. We fine-tune multilingual LLMs for rhetorical figure classification, and evaluate whether their outputs align with different demographic perspectives. Results reveal that models show varying degrees of alignment with specific groups, reflecting potential perspectivist behavior in model predictions. These findings highlight the role of rhetorical figures in structuring irony perception and underscore the importance of socio-demographics in both annotation and model evaluation.

Can Large Language Models Personalize Dialogues to Generational Styles?
Pier Felice Balestrucci | Ondrej Dusek | Luca Anselma | Alessandro Mazzei
Findings of the Association for Computational Linguistics: EMNLP 2025

We investigate how large language models (LLMs) can produce personalized dialogue responses, specifically focusing on whether they reflect linguistic styles pertaining to different generations: Baby Boomers, Generation X, Generation Y, and Generation Z. We create P-MultiWoZ, a personalized, generation-specific version of MultiWOZ 2.2, by prompting LLMs, and validate its alignment with the original dataset through automatic and human evaluations. To validate the appropriateness of generational linguistic traits, we introduce GeMoSC, a corpus of generation-annotated movie dialogues. Linguistic analysis and perplexity test suggest that P-MultiWoZ reflects patterns consistent with GeMoSC. Finally, a human evaluation reveals that annotators were able to mostly correctly identify the generation behind P-MultiWoZ dialogues, based only on a single query-reply pair.

2024

Data Augmentation for Low-Resource Italian NLP: Enhancing Semantic Processing with DRS
Muhammad Saad Amin | Luca Anselma | Alessandro Mazzei
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Discourse Representation Structure (DRS), a formal meaning representation, has shown promising results in semantic parsing and natural language generation tasks for high-resource languages like English. This paper investigates enhancing the application of DRS to low-resource Italian Natural Language Processing (NLP), in both semantic parsing (Text-to-DRS) and natural language generation (DRS-to-Text). To address the scarcity of annotated corpora for Italian DRS, we propose a novel data augmentation technique that involves the use of external linguistic resources including: (i) WordNet for common nouns, adjectives, adverbs, and verbs; (ii) LLM-generated named entities for proper nouns; and (iii) rule-based algorithms fortense augmentation. This approach not only increases the quantity of training data but also introduces linguistic diversity, which is crucial for improving model performance and robustness. Using this augmented dataset, we developed neural semantic parser and generator models that demonstrated enhanced generalization ability compared to models trained on non-augmented data. We evaluated the effect of semantic data augmentation using two state-of-the-art transformer-based neural sequence-to-sequence models, i.e., byT5 and IT5. Our implementation shows promising results for Italian semanticprocessing. Data augmentation significantly increased the performance of semantic parsing from 76.10 to 90.56 (+14.46%) F1-SMATCH score and generation with 37.79 to 57.48 (+19.69%) BLEU, 30.83 to 40.95 (+10.12%) METEOR, 81.66 to 90.97 (+9.31%) COMET, 54.84 to 70.88 (+16.04%) chrF, and 88.86 to 92.97 (+4.11%) BERT scores. These results demonstrate the effectiveness of our novel augmentation approach in enhancing semantic processing capabilities for low-resource languages like Italian.

Educational Dialogue Systems for Visually Impaired Students: Introducing a Task-Oriented User-Agent Corpus
Elisa Di Nuovo | Manuela Sanguinetti | Pier Felice Balestrucci | Luca Anselma | Cristian Bernareggi | Alessandro Mazzei
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper describes a corpus consisting of real-world dialogues in English between users and a task-oriented conversational agent, with interactions revolving around the description of finite state automata. The creation of this corpus is part of a larger research project aimed at developing tools for an easier access to educational content, especially in STEM fields, for users with visual impairments. The development of this corpus was precisely motivated by the aim of providing a useful resource to support the design of such tools. The core feature of this corpus is that its creation involved both sighted and visually impaired participants, thus allowing for a greater diversity of perspectives and giving the opportunity to identify possible differences in the way the two groups of participants interacted with the agent. The paper introduces this corpus, giving an account of the process that led to its creation, i.e. the methodology followed to obtain the data, the annotation scheme adopted, and the analysis of the results. Finally, the paper reports the results of a classification experiment on the annotated corpus, and an additional experiment to assess the annotation capabilities of three large language models, in view of a further expansion of the corpus.

Exploring Data Augmentation in Neural DRS-to-Text Generation
Muhammad Saad Amin | Luca Anselma | Alessandro Mazzei
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural networks are notoriously data-hungry. This represents an issue in cases where data are scarce such as in low-resource languages. Data augmentation is a technique commonly used in computer vision to provide neural networks with more data and increase their generalization power. When dealing with data augmentation for natural language, however, simple data augmentation techniques similar to the ones used in computer vision such as rotation and cropping cannot be employed because they would generate ungrammatical texts. Thus, data augmentation needs a specific design in the case of neural logic-to-text systems, especially for a structurally rich input format such as the ones used for meaning representation. This is the case of the neural natural language generation for Discourse Representation Structures (DRS-to-Text), where the logical nature of DRS needs a specific design of data augmentation. In this paper, we adopt a novel approach in DRS-to-Text to selectively augment a training set with new data by adding and varying two specific lexical categories, i.e. proper and common nouns. In particular, we propose using WordNet supersenses to produce new training sentences using both in-and-out-of-context nouns. We present a number of experiments for evaluating the role played by augmented lexical information. The experimental results prove the effectiveness of our approach for data augmentation in DRS-to-Text generation.

DipInfo-UniTo at the GEM’24 Data-to-Text Task: Augmenting LLMs with the Split-Generate-Aggregate Pipeline
Michael Oliverio | Pier Felice Balestrucci | Alessandro Mazzei | Valerio Basile
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges

This paper describes the DipInfo-UniTo system participating to the GEM shared task 2024. We participate only to the Data-to-Text (D2T) task. The DipInfo-UniTo system is based on Mistral (Jiang et al., 2023), a recent Large Language Model (LLM). Most LLMs are capable of generating high-quality text for D2T tasks but, crucially, they often fall short in terms of adequacy, and sometimes exhibit “hallucinations”. To mitigate this issue, we have implemented a generation pipeline that combines LLMs with techniques from the traditional Natural Language Generation (NLG) pipeline. In particular, we have a three step process SGA, consisting in (1) Splitting the original set of triples, (2) Generating verbalizations from the resulting split data units, (3) Aggregating the verbalizations produced in the previous step.

I’m sure you’re a real scholar yourself: Exploring Ironic Content Generation by Large Language Models
Pier Felice Balestrucci | Silvia Casola | Soda Marem Lo | Valerio Basile | Alessandro Mazzei
Findings of the Association for Computational Linguistics: EMNLP 2024

Generating ironic content is challenging: it requires a nuanced understanding of context and implicit references and balancing seriousness and playfulness. Moreover, irony is highly subjective and can depend on various factors, such as social, cultural, or generational aspects. This paper explores whether Large Language Models (LLMs) can learn to generate ironic responses to social media posts. To do so, we fine-tune two models to generate ironic and non-ironic content and deeply analyze their outputs’ linguistic characteristics, their connection to the original post, and their similarity to the human-written replies. We also conduct a large-scale human evaluation of the outputs. Additionally, we investigate whether LLMs can learn a form of irony tied to a generational perspective, with mixed results.

2023

Building a Spoken Dialogue System for Supporting Blind People in Accessing Mathematical Expressions
Pier Felice Balestrucci | Luca Anselma | Cristian Bernareggi | Alessandro Mazzei
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

Exploring Sentiments in Summarization: SentiTextRank, an Emotional Variant of TextRank
Md. Murad Hossain | Luca Anselma | Alessandro Mazzei
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

Introducing Deep Learning with Data Augmentation and Corpus Construction for LIS
Manuela Marchisio | Alessandro Mazzei | Dario Sammaruga
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

2022

Personalizing Weekly Diet Reports
Elena Monfroglio | Luca Anselma | Alessandro Mazzei
Proceedings of the First Workshop on Natural Language Generation in Healthcare

In this paper we present the main components of a weekly diet report generator (DRG) in natural language. The idea is to produce a text that contains information on the adherence of the dishes eaten during a week to the Mediterranean diet. The system is based on a user model, a database of the dishes eaten during the week and on the automatic computation of the Mediterranean Diet Score. All these sources of information are exploited to produce a highly personalized text. The system has two main goals, related to two different kinds of users: on the one hand, when used by dietitians, the main goal is to highlight the most salient medical information of the patient diet and, on the other hand, when used by final users, the main goal is to educate them toward a Mediterranean style of eating.

2021

Query in linguaggio naturale per il dominio della dieta mediterranea(Natural Language Queries for the Mediterranean Diet Domain)
Luca Anselma | Dario Ferrero | Alessandro Mazzei
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

2020

The “Corpus Anchise 320” and the Analysis of Conversations between Healthcare Workers and People with Dementia
Nicola Benvenuti | Andrea Bolioli | Alessandro Mazzei | Pietro Vigorelli | Alessio Bosca
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

Natural Language Generation in Dialogue Systems for Customer Care
Mirko Di Lascio | Manuela Sanguinetti | Luca Anselma | Dario Mana | Alessandro Mazzei | Viviana Patti | Rossana Simeoni
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

Content Selection for Explanation Requests in Customer-Care Domain
Luca Anselma | Mirko Di Lascio | Dario Mana | Alessandro Mazzei | Manuela Sanguinetti
2nd Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence

This paper describes a content selection module for the generation of explanations in a dialogue system designed for customer care domain. First we describe the construction of a corpus of a dialogues containing explanation requests from customers to a virtual agent of a telco, and second we study and formalize the importance of a specific information content for the generated message. In particular, we adapt the notions of importance and relevance in the case of schematic knowledge bases.

Annotating Errors and Emotions in Human-Chatbot Interactions in Italian
Manuela Sanguinetti | Alessandro Mazzei | Viviana Patti | Marco Scalerandi | Dario Mana | Rossana Simeoni
Proceedings of the 14th Linguistic Annotation Workshop

This paper describes a novel annotation scheme specifically designed for a customer-service context where written interactions take place between a given user and the chatbot of an Italian telecommunication company. More specifically, the scheme aims to detect and highlight two aspects: the presence of errors in the conversation on both sides (i.e. customer and chatbot) and the “emotional load” of the conversation. This can be inferred from the presence of emotions of some kind (especially negative ones) in the customer messages, and from the possible empathic responses provided by the agent. The dataset annotated according to this scheme is currently used to develop the prototype of a rule-based Natural Language Generation system aimed at improving the chatbot responses and the customer experience overall.

Building a Treebank in Universal Dependencies for Italian Sign Language
Gaia Caligiore | Cristina Bosco | Alessandro Mazzei
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

2019

Evaluating Speech Synthesis on Mathematical Sentences
Alessandro Mazzei | Michele Monticone | Cristian Bernareggi
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

Using NLG for speech synthesis of mathematical sentences
Alessandro Mazzei | Michele Monticone | Cristian Bernareggi
Proceedings of the 12th International Conference on Natural Language Generation

People with sight impairments can access to a mathematical expression by using its LaTeX source. However, this mechanisms have several drawbacks: (1) it assumes the knowledge of the LaTeX, (2) it is slow, since LaTeX is verbose and (3) it is error-prone since LATEX is a typographical language. In this paper we study the design of a natural language generation system for producing a mathematical sentence, i.e. a natural language sentence expressing the semantics of a mathematical expression. Moreover, we describe the main results of a first human based evaluation experiment of the system for Italian language.

Evaluating the MuMe Dialogue System with the IDIAL protocol
Aureliano Porporato | Alessandro Mazzei | Daniele P. Radicioni | Rosa Meo
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

Towards an Italian Learner Treebank in Universal Dependencies
Elisa Di Nuovo | Cristina Bosco | Alessandro Mazzei | Manuela Sanguinetti
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

The DipInfoUniTo Realizer at SRST’19: Learning to Rank and Deep Morphology Prediction for Multilingual Surface Realization
Alessandro Mazzei | Valerio Basile
Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)

We describe the system presented at the SR’19 shared task by the DipInfoUnito team. Our approach is based on supervised machine learning. In particular, we divide the SR task into two independent subtasks, namely word order prediction and morphology inflection prediction. Two neural networks with different architectures run on the same input structure, each producing a partial output which is recombined in the final step in order to produce the predicted surface form. This work is a direct successor of the architecture presented at SR’19.

2018

The DipInfo-UniTo system for SRST 2018
Valerio Basile | Alessandro Mazzei
Proceedings of the First Workshop on Multilingual Surface Realisation

This paper describes the system developed by the DipInfo-UniTo team to participate to the shallow track of the Surface Realization Shared Task 2018. The system employs two separate neural networks with different architectures to predict the word ordering and the morphological inflection independently from each other. The UniTO realizer is language independent, and its simple architecture allowed it to be scored in the central part of the final ranking of the shared task.

PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies
Manuela Sanguinetti | Cristina Bosco | Alberto Lavelli | Alessandro Mazzei | Oronzo Antonelli | Fabio Tamburini
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Auxiliary Selection in Italian Intransitive Verbs: A Computational Investigation based on Annotated Corpora
Ilaria Ghezzi | Cristina Bosco | Alessandro Mazzei
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

CheckYourMeal!: diet management with NLG
Luca Anselma | Simone Donetti | Alessandro Mazzei | Andrea Pirone
Proceedings of the Workshop on Intelligent Interactive Systems and Language Generation (2IS&NLG)

Neural Surface Realization for Italian
Valerio Basile | Alessandro Mazzei
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

Designing and testing the messages produced by a virtual dietitian
Luca Anselma | Alessandro Mazzei
Proceedings of the 11th International Conference on Natural Language Generation

This paper presents a project about the automatic generation of persuasive messages in the context of the diet management. In the first part of the paper we introduce the basic mechanisms related to data interpretation and content selection for a numerical data-to-text generation architecture. In the second part of the paper we discuss a number of factors influencing the design of the messages. In particular, we consider the design of the aggregation procedure. Finally, we present the results of a human-based evaluation concerning this design factor.

Preface
Elena Cabrio | Alessandro Mazzei | Fabio Tamburini
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)
Elena Cabrio | Alessandro Mazzei | Fabio Tamburini
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

2017

Dealing with Italian Adjectives in Noun Phrase: a Study Oriented to Natural Language Generation
Giorgia Conte | Cristina Bosco | Alessandro Mazzei
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

Annotating Italian Social Media Texts in Universal Dependencies
Manuela Sanguinetti | Cristina Bosco | Alessandro Mazzei | Alberto Lavelli | Fabio Tamburini
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

2016

Combinatorics vs Grammar: Archeology of Computational Poetry in Tape Mark I
Alessandro Mazzei | Andrea Valle
Proceedings of the INLG 2016 Workshop on Computational Creativity in Natural Language Generation

SimpleNLG-IT: adapting SimpleNLG to Italian
Alessandro Mazzei | Cristina Battaglino | Cristina Bosco
Proceedings of the 9th International Natural Language Generation conference

2015

Translating Italian to LIS in the Rail Stations
Alessandro Mazzei
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)

2012

Sign Language Generation with Expert Systems and CCG
Alessandro Mazzei
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

2011

An Ontology Based Architecture for Translation
Leonardo Lesmo | Alessandro Mazzei | Daniele P. Radicioni
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

Building a Generator for Italian Sign Language
Alessandro Mazzei
Proceedings of the 13th European Workshop on Natural Language Generation

2010

Comparing the Influence of Different Treebank Annotations on Dependency Parsing
Cristina Bosco | Simonetta Montemagni | Alessandro Mazzei | Vincenzo Lombardo | Felice Dell’Orletta | Alessandro Lenci | Leonardo Lesmo | Giuseppe Attardi | Maria Simi | Alberto Lavelli | Johan Hall | Jens Nilsson | Joakim Nivre
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

As the interest of the NLP community grows to develop several treebanks also for languages other than English, we observe efforts towards evaluating the impact of different annotation strategies used to represent particular languages or with reference to particular tasks. This paper contributes to the debate on the influence of resources used for the training and development on the performance of parsing systems. It presents a comparative analysis of the results achieved by three different dependency parsers developed and tested with respect to two treebanks for the Italian language, namely TUT and ISST--TANL, which differ significantly at the level of both corpus composition and adopted dependency representations.

2008

Evaluation of Natural Language Tools for Italian: EVALITA 2007
Bernardo Magnini | Amedeo Cappelli | Fabio Tamburini | Cristina Bosco | Alessandro Mazzei | Vincenzo Lombardo | Francesca Bertagna | Nicoletta Calzolari | Antonio Toral | Valentina Bartalesi Lenzi | Rachele Sprugnoli | Manuela Speranza
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

EVALITA 2007, the first edition of the initiative devoted to the evaluation of Natural Language Processing tools for Italian, provided a shared framework where participants systems had the possibility to be evaluated on five different tasks, namely Part of Speech Tagging (organised by the University of Bologna), Parsing (organised by the University of Torino), Word Sense Disambiguation (organised by CNR-ILC, Pisa), Temporal Expression Recognition and Normalization (organised by CELCT, Trento), and Named Entity Recognition (organised by FBK, Trento). We believe that the diffusion of shared tasks and shared evaluation practices is a crucial step towards the development of resources and tools for Natural Language Processing. Experiences of this kind, in fact, are a valuable contribution to the validation of existing models and data, allowing for consistent comparisons among approaches and among representation schemes. The good response obtained by EVALITA, both in the number of participants and in the quality of results, showed that pursuing such goals is feasible not only for English, but also for other languages.

Comparing Italian parsers on a common Treebank: the EVALITA experience
Cristina Bosco | Alessandro Mazzei | Vincenzo Lombardo | Giuseppe Attardi | Anna Corazza | Alberto Lavelli | Leonardo Lesmo | Giorgio Satta | Maria Simi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The EVALITA 2007 Parsing Task has been the first contest among parsing systems for Italian. It is the first attempt to compare the approaches and the results of the existing parsing systems specific for this language using a common treebank annotated using both a dependency and a constituency-based format. The development data set for this parsing competition was taken from the Turin University Treebank, which is annotated both in dependency and constituency format. The evaluation metrics were those standardly applied in CoNLL and PARSEVAL. The results of the parsing results are very promising and higher than the state-of-the-art for dependency parsing of Italian. An analysis of such results is provided, which takes into account other experiences in treebank-driven parsing for Italian and for other Romance languages (in particular, the CoNLL X & 2007 shared tasks for dependency parsing). It focuses on the characteristics of data sets, i.e. type of annotation and size, parsing paradigms and approaches applied also to languages other than Italian.

2007

Multilingual Ontological Analysis of European Directives
Gianmaria Ajani | Guido Boella | Leonardo Lesmo | Alessandro Mazzei | Piercarlo Rossi
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

A Development Tool For Multilingual Ontology-based Conceptual
G. Ajani | G. Boella | L. Lesmo | M. Martin | A Mazzei | P. Rossi
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper introduces a number theoretical and practical issues related to the Syllabus. Syllabusis a multi-lingua ontology based tool, designed to improve the applications of the European Directives in the various European countries.

2004

Competence and Performance Grammar in Incremental Processing
Vincenzo Lombardo | Alessandro Mazzei | Patrick Sturt
Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together

Building a Large Grammar for Italian
Alessandro Mazzei | Vincenzo Lombardo
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Co-authors

Leonardo Lesmo 5

Vincenzo Lombardo 5

Michael Oliverio 5

Fabio Tamburini 5

Muhammad Saad Amin 4

Cristian Bernareggi 4

Alberto Lavelli 4

Viviana Patti 4

Gianmaria Ajani 2

Giuseppe Attardi 2

Mirko Di Lascio 2

Elisa Di Nuovo 2

Soda Marem Lo 2

Michele Monticone 2

Daniele P. Radicioni 2

Piercarlo Rossi 2

Rossana Simeoni 2

Oronzo Antonelli 1

Valentina Bartalesi Lenzi 1

Cristina Battaglino 1

Nicola Benvenuti 1

Francesca Bertagna 1

Andrea Bolioli 1

Alessio Bosca 1

Gaia Caligiore 1

Nicoletta Calzolari 1

Amedeo Cappelli 1

Silvia Casola 1

Elisa Chierchiello 1

Giorgia Conte 1

Felice Dell’Orletta 1

Eliana Di Palma 1

Simone Donetti 1

Ondřej Dušek 1

Dario Ferrero 1

Ilaria Ghezzi 1

Md. Murad Hossain 1

Alessandro Lenci 1

Bernardo Magnini 1

Manuela Marchisio 1

Elena Monfroglio 1

Simonetta Montemagni 1

Andrea Pirone 1

Aureliano Porporato 1

Stefano Vittorio Porta 1

Dario Sammaruga 1

Giorgio Satta 1

Marco Scalerandi 1

Manuela Speranza 1

Rachele Sprugnoli 1

Patrick Sturt 1

Antonio Toral 1

Pietro Vigorelli 1

Venues

NLPerspectives1