Martina Miliani


2022

pdf bib
Neural Readability Pairwise Ranking for Sentences in Italian Administrative Language
Martina Miliani | Serena Auriemma | Fernando Alva-Manchego | Alessandro Lenci
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Automatic Readability Assessment aims at assigning a complexity level to a given text, which could help improve the accessibility to information in specific domains, such as the administrative one. In this paper, we investigate the behavior of a Neural Pairwise Ranking Model (NPRM) for sentence-level readability assessment of Italian administrative texts. To deal with data scarcity, we experiment with cross-lingual, cross- and in-domain approaches, and test our models on Admin-It, a new parallel corpus in the Italian administrative language, containing sentences simplified using three different rewriting strategies. We show that NPRMs are effective in zero-shot scenarios (~0.78 ranking accuracy), especially with ranking pairs containing simplifications produced by overall rewriting at the sentence-level, and that the best results are obtained by adding in-domain data (achieving perfect performance for such sentence pairs). Finally, we investigate where NPRMs failed, showing that the characteristics of the training data, rather than its size, have a bigger effect on a model’s performance.

2020

pdf bib
FRAQUE: a FRAme-based QUEstion-answering system for the Public Administration domain
Martina Miliani | Lucia C. Passaro | Alessandro Lenci
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)

In this paper, we propose FRAQUE, a question answering system for factoid questions in the Public administration domain. The system is based on semantic frames, here intended as collections of slots typed with their possible values. FRAQUE queries unstructured textual data and exploits the potential of different approaches: it extracts pattern elements from texts which are linguistically analyzed through statistical methods.FRAQUE allows Italian users to query vast document repositories related to the domain of Public Administration. Given the statistical nature of most of its components such as word embeddings, the system allows for a flexible domain and language adaptation process. FRAQUE’s goal is to associate questions with frames stored into a Knowledge Graph along with relevant document passages, which are returned as the answer.

pdf bib
“Voices of the Great War”: A Richly Annotated Corpus of Italian Texts on the First World War
Federico Boschetti | Irene De Felice | Stefano Dei Rossi | Felice Dell’Orletta | Michele Di Giorgio | Martina Miliani | Lucia C. Passaro | Angelica Puddu | Giulia Venturi | Nicola Labanca | Alessandro Lenci | Simonetta Montemagni
Proceedings of the Twelfth Language Resources and Evaluation Conference

“Voices of the Great War” is the first large corpus of Italian historical texts dating back to the period of First World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it gives account of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducated writers), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historical perspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles of narrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, the language variety used, the author type and the typology of conveyed contents. The corpus is fully annotated with lemmas, part-of-speech, terminology, and named entities. Significant corpus samples representative of the different “voices” have also been enriched with meta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebank complying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to build it, and the Web Interface for navigating it.