Simonetta Montemagni

Also published as: S. Montemagni


2020

pdf bib
“Voices of the Great War”: A Richly Annotated Corpus of Italian Texts on the First World War
Federico Boschetti | Irene De Felice | Stefano Dei Rossi | Felice Dell’Orletta | Michele Di Giorgio | Martina Miliani | Lucia C. Passaro | Angelica Puddu | Giulia Venturi | Nicola Labanca | Alessandro Lenci | Simonetta Montemagni
Proceedings of the 12th Language Resources and Evaluation Conference

“Voices of the Great War” is the first large corpus of Italian historical texts dating back to the period of First World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it gives account of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducated writers), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historical perspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles of narrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, the language variety used, the author type and the typology of conveyed contents. The corpus is fully annotated with lemmas, part-of-speech, terminology, and named entities. Significant corpus samples representative of the different “voices” have also been enriched with meta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebank complying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to build it, and the Web Interface for navigating it.

pdf bib
Profiling-UD: a Tool for Linguistic Profiling of Texts
Dominique Brunato | Andrea Cimino | Felice Dell’Orletta | Giulia Venturi | Simonetta Montemagni
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we introduce Profiling–UD, a new text analysis tool inspired to the principles of linguistic profiling that can support language variation research from different perspectives. It allows the extraction of more than 130 features, spanning across different levels of linguistic description. Beyond the large number of features that can be monitored, a main novelty of Profiling–UD is that it has been specifically devised to be multilingual since it is based on the Universal Dependencies framework. In the second part of the paper, we demonstrate the effectiveness of these features in a number of theoretical and applicative studies in which they were successfully used for text and author profiling.

2018

pdf bib
Universal Dependencies and Quantitative Typological Trends. A Case Study on Word Order
Chiara Alzetta | Felice Dell’Orletta | Simonetta Montemagni | Giulia Venturi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Assessing the Impact of Incremental Error Detection and Correction. A Case Study on the Italian Universal Dependency Treebank
Chiara Alzetta | Felice Dell’Orletta | Simonetta Montemagni | Maria Simi | Giulia Venturi
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

Detection and correction of errors and inconsistencies in “gold treebanks” are becoming more and more central topics of corpus annotation. The paper illustrates a new incremental method for enhancing treebanks, with particular emphasis on the extension of error patterns across different textual genres and registers. Impact and role of corrections have been assessed in a dependency parsing experiment carried out with four different parsers, whose results are promising. For both evaluation datasets, the performance of parsers increases, in terms of the standard LAS and UAS measures and of a more focused measure taking into account only relations involved in error patterns, and at the level of individual dependencies.

pdf bib
Enhancing Universal Dependency Treebanks: A Case Study
Joakim Nivre | Paola Marongiu | Filip Ginter | Jenna Kanerva | Simonetta Montemagni | Sebastian Schuster | Maria Simi
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

We evaluate two cross-lingual techniques for adding enhanced dependencies to existing treebanks in Universal Dependencies. We apply a rule-based system developed for English and a data-driven system trained on Finnish to Swedish and Italian. We find that both systems are accurate enough to bootstrap enhanced dependencies in existing UD treebanks. In the case of Italian, results are even on par with those of a prototype language-specific system.

2017

pdf bib
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)
Simonetta Montemagni | Joakim Nivre
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib
Dangerous Relations in Dependency Treebanks
Chiara Alzetta | Felice Dell’Orletta | Simonetta Montemagni | Giulia Venturi
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories

2016

pdf bib
CItA: an L1 Italian Learners Corpus to Study the Development of Writing Competence
Alessia Barbagli | Pietro Lucisano | Felice Dell’Orletta | Simonetta Montemagni | Giulia Venturi
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we present the CItA corpus (Corpus Italiano di Apprendenti L1), a collection of essays written by Italian L1 learners collected during the first and second year of lower secondary school. The corpus was built in the framework of an interdisciplinary study jointly carried out by computational linguistics and experimental pedagogists and aimed at tracking the development of written language competence over the years and students’ background information.

pdf bib
ALT Explored: Integrating an Online Dialectometric Tool and an Online Dialect Atlas
Martijn Wieling | Eva Sassolini | Sebastiana Cucurullo | Simonetta Montemagni
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we illustrate the integration of an online dialectometric tool, Gabmap, together with an online dialect atlas, the Atlante Lessicale Toscano (ALT-Web). By using a newly created url-based interface to Gabmap, ALT-Web is able to take advantage of the sophisticated dialect visualization and exploration options incorporated in Gabmap. For example, distribution maps showing the distribution in the Tuscan dialect area of a specific dialectal form (selected via the ALT-Web website) are easily obtainable. Furthermore, the complete ALT-Web dataset as well as subsets of the data (selected via the ALT-Web website) can be automatically uploaded and explored in Gabmap. By combining these two online applications, macro- and micro-analyses of dialectal data (respectively offered by Gabmap and ALT-Web) are effectively and dynamically combined.

2015

pdf bib
Design and Annotation of the First Italian Corpus for Text Simplification
Dominique Brunato | Felice Dell’Orletta | Giulia Venturi | Simonetta Montemagni
Proceedings of The 9th Linguistic Annotation Workshop

pdf bib
NLP–Based Readability Assessment of Health–Related Texts: a Case Study on Italian Informed Consent Forms
Giulia Venturi | Tommaso Bellandi | Felice Dell’Orletta | Simonetta Montemagni
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

2014

pdf bib
Assessing the Readability of Sentences: Which Corpora and Features?
Felice Dell’Orletta | Martijn Wieling | Giulia Venturi | Andrea Cimino | Simonetta Montemagni
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
T2K^2: a System for Automatically Extracting and Organizing Knowledge from Texts
Felice Dell’Orletta | Giulia Venturi | Andrea Cimino | Simonetta Montemagni
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we present T2K^2, a suite of tools for automatically extracting domain―specific knowledge from collections of Italian and English texts. T2K^2 (Text―To―Knowledge v2) relies on a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine learning which are dynamically integrated to provide an accurate and incremental representation of the content of vast repositories of unstructured documents. Extracted knowledge ranges from domain―specific entities and named entities to the relations connecting them and can be used for indexing document collections with respect to different information types. T2K^2 also includes “linguistic profiling” functionalities aimed at supporting the user in constructing the acquisition corpus, e.g. in selecting texts belonging to the same genre or characterized by the same degree of specialization or in monitoring the “added value” of newly inserted documents. T2K^2 is a web application which can be accessed from any browser through a personal account which has been tested in a wide range of domains.

pdf bib
Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies
Maria Simi | Cristina Bosco | Simonetta Montemagni
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to a reduced dependency tag set. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported.

2013

pdf bib
Linguistic Profiling based on General–purpose Features and Native Language Identification
Andrea Cimino | Felice Dell’Orletta | Giulia Venturi | Simonetta Montemagni
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Unsupervised Linguistically-Driven Reliable Dependency Parses Detection and Self-Training for Adaptation to the Biomedical Domain
Felice Dell’Orletta | Giulia Venturi | Simonetta Montemagni
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing

pdf bib
Converting Italian Treebanks: Towards an Italian Stanford Dependency Treebank
Cristina Bosco | Simonetta Montemagni | Maria Simi
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Linguistic Profiling of Texts Across Textual Genres and Readability Levels. An Exploratory Study on Italian Fictional Prose
Felice Dell’Orletta | Simonetta Montemagni | Giulia Venturi
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2012

pdf bib
Genre-oriented Readability Assessment: a Case Study
Felice Dell’Orletta | Giulia Venturi | Simonetta Montemagni
Proceedings of the Workshop on Speech and Language Processing Tools in Education

pdf bib
Enriching the ISST-TANL Corpus with Semantic Frames
Alessandro Lenci | Simonetta Montemagni | Giulia Venturi | Maria Grazia Cutrullà
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper describes the design and the results of a manual annotation methodology devoted to enrich the ISST--TANL Corpus, derived from the Italian Syntactic--Semantic Treebank (ISST), with Semantic Frames information. The main issues encountered in applying the English FrameNet annotation criteria to a corpus of Italian language are discussed together with the choice of anchoring the semantic annotation layer to the underlying dependency syntactic structure. The results of a case study aimed at extending and specialising this methodology for the annotation of a corpus of legislative texts are also discussed.

2011

pdf bib
ULISSE: an Unsupervised Algorithm for Detecting Reliable Dependency Parses
Felice Dell’Orletta | Giulia Venturi | Simonetta Montemagni
Proceedings of the Fifteenth Conference on Computational Natural Language Learning

pdf bib
READIT: Assessing Readability of Italian Texts with a View to Text Simplification
Felice Dell’Orletta | Simonetta Montemagni | Giulia Venturi
Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies

2010

pdf bib
Contrastive Filtering of Domain-Specific Multi-Word Terms from Different Types of Corpora
Francesca Bonin | Felice Dell’Orletta | Giulia Venturi | Simonetta Montemagni
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

pdf bib
Comparing the Influence of Different Treebank Annotations on Dependency Parsing
Cristina Bosco | Simonetta Montemagni | Alessandro Mazzei | Vincenzo Lombardo | Felice Dell’Orletta | Alessandro Lenci | Leonardo Lesmo | Giuseppe Attardi | Maria Simi | Alberto Lavelli | Johan Hall | Jens Nilsson | Joakim Nivre
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

As the interest of the NLP community grows to develop several treebanks also for languages other than English, we observe efforts towards evaluating the impact of different annotation strategies used to represent particular languages or with reference to particular tasks. This paper contributes to the debate on the influence of resources used for the training and development on the performance of parsing systems. It presents a comparative analysis of the results achieved by three different dependency parsers developed and tested with respect to two treebanks for the Italian language, namely TUT and ISST--TANL, which differ significantly at the level of both corpus composition and adopted dependency representations.

pdf bib
A Resource and Tool for Super-sense Tagging of Italian Texts
Giuseppe Attardi | Stefano Dei Rossi | Giulia Di Pietro | Alessandro Lenci | Simonetta Montemagni | Maria Simi
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

A SuperSense Tagger is a tool for the automatic analysis of texts that associates to each noun, verb, adjective and adverb a semantic category within a general taxonomy. The developed tagger, based on a statistical model (Maximum Entropy), required the creation of an Italian annotated corpus, to be used as a training set, and the improvement of various existing tools. The obtained results significantly improved the current state-of-the art for this particular task.

pdf bib
A Contrastive Approach to Multi-word Extraction from Domain-specific Corpora
Francesca Bonin | Felice Dell’Orletta | Simonetta Montemagni | Giulia Venturi
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present a novel approach to multi-word terminology extraction combining a well-known automatic term recognition approach, the C--NC value method, with a contrastive ranking technique, aimed at refining obtained results either by filtering noise due to common words or by discerning between semantically different types of terms within heterogeneous terminologies. Differently from other contrastive methods proposed in the literature that focus on single terms to overcome the multi-word terms' sparsity problem, the proposed contrastive function is able to handle variation in low frequency events by directly operating on pre-selected multi-word terms. This methodology has been tested in two case studies carried out in the History of Art and Legal domains. Evaluation of achieved results showed that the proposed two--stage approach improves significantly multi--word term extraction results. In particular, for what concerns the legal domain it provides an answer to a well-known problem in the semi--automatic construction of legal ontologies, namely that of singling out law terms from terms of the specific domain being regulated.

2008

pdf bib
Unsupervised Acquisition of Verb Subcategorization Frames from Shallow-Parsed Corpora
Alessandro Lenci | Barbara McGillivray | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we reported experiments of unsupervised automatic acquisition of Italian and English verb subcategorization frames (SCFs) from general and domain corpora. The proposed technique operates on syntactically shallow-parsed corpora on the basis of a limited number of search heuristics not relying on any previous lexico-syntactic knowledge about SCFs. Although preliminary, reported results are in line with state-of-the-art lexical acquisition systems. The issue of whether verbs sharing similar SCFs distributions happen to share similar semantic properties as well was also explored by clustering verbs that share frames with the same distribution using the Minimum Description Length Principle (MDL). First experiments in this direction were carried out on Italian verbs with encouraging results.

pdf bib
Ontology Learning and Semantic Annotation: a Necessary Symbiosis
Emiliano Giovannetti | Simone Marchi | Simonetta Montemagni | Roberto Bartolini
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Semantic annotation of text requires the dynamic merging of linguistically structured information and a “world model”, usually represented as a domain-specific ontology. On the other hand, the process of engineering a domain-ontology through semi-automatic ontology learning system requires the availability of a considerable amount of semantically annotated documents. Facing this bootstrapping paradox requires an incremental process of annotation-acquisition-annotation, whereby domain-specific knowledge is acquired from linguistically-annotated texts and then projected back onto texts for extra linguistic information to be annotated and further knowledge layers to be extracted. The presented methodology is a first step in the direction of a full “virtuous” circle where the semantic annotation platform and the evolving ontology interact in symbiosis. As a case study we have chosen the semantic annotation of product catalogues. We propose a hybrid approach, combining pattern matching techniques to exploit the regular structure of product descriptions in catalogues, and Natural Language Processing techniques which are resorted to analyze natural language descriptions. The semantic annotation involves the access to the ontology, semi-automatically bootstrapped with an ontology learning tool from annotated collections of catalogues.

pdf bib
Building a Bio-Event Annotated Corpus for the Acquisition of Semantic Frames from Biomedical Corpora
Paul Thompson | Philip Cotter | John McNaught | Sophia Ananiadou | Simonetta Montemagni | Andrea Trabucco | Giulia Venturi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper reports on the design and construction of a bio-event annotated corpus which was developed with a specific view to the acquisition of semantic frames from biomedical corpora. We describe the adopted annotation scheme and the annotation process, which is supported by a dedicated annotation tool. The annotated corpus contains 677 abstracts of biomedical research articles.

2006

pdf bib
Probing the Space of Grammatical Variation: Induction of Cross-Lingual Grammatical Constraints from Treebanks
Felice Dell’Orletta | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006

pdf bib
Searching treebanks for functional constraints: cross-lingual experiments in grammatical relation assignment
Felice Dell’Orletta | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The paper reports on a detailed quantitative analysis of distributional language data of both Italian and Czech, highlighting the relative contribution of a number of distributed grammatical factors to sentence-based identification of subjects and direct objects. The work is based on a Maximum Entropy model of stochastic resolution of grammatical conflicting constraints, and is demonstrably capable of putting explanatory theoretical accounts to the challenging test of an extensive, usage-based empirical verification.

pdf bib
Dialectal resources on-line: the ALT-Web experience
Nella Cucurullo | Simonetta Montemagni | Matilde Paoli | Eugenio Picchi | Eva Sassolini
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The paper presents an on-line dialectal resource, ALT-Web, which gives access to the linguistic data of the Atlante Lessicale Toscano, a specially designed linguistic atlas in which lexical data have both a diatopic and diastratic characterisation. The paper focuses on: the dialectal data representation model; the access modalities to the ALT dialectal corpus; ontology-based search.

2005

pdf bib
Climbing the Path to Grammar: A Maximum Entropy Model of Subject/Object Learning
Felice Dell’Orletta | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition

2004

pdf bib
Semantic Mark-up of Italian Legal Texts Through NLP-based Techniques
Roberto Bartolini | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli | Claudia Soria
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Hybrid Constraints for Robust Parsing: First Experiments and Evaluation
Roberto Bartolini | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
NLP-enhanced Content Filtering Within the POESIA Project
Mark Hepple | Neil Ireson | Paolo Allegrini | Simone Marchi | Simonetta Montemagni | Jose Maria Gomez Hidalgo
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
The Lexicon-Grammar Balance in Robust Parsing of Italian
Roberto Bartolini | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Grammar and Lexicon in the Robust Parsing of Italian towards a Non-Naïve Interplay
Roberto Bartolini | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli
COLING-02: Grammar Engineering and Evaluation

2000

pdf bib
Where Opposites Meet. A Syntactic Meta-scheme for Corpus Annotation and Parsing Evaluation
Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli | Claudia Soria
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Controlled Bootstrapping of Lexico-semantic Classes as a Bridge between Paradigmatic and Syntagmatic Knowledge: Methodology and Evaluation
Paolo Allegrini | Simonetta Montemagni | Vito Pirrelli
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Learning Word Clusters from Data Types
Paolo Allegrini | Simonetta Montemagni | Vito Pirrelli
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
The Italian Syntactic-Semantic Treebank: Architecture, Annotation, Tools and Evaluation
S. Montemagni | F. Barsotti | M. Battista | N. Calzolari | O. Corazzari | A. Zampolli | F. Fanciulli | M. Massetani | R. Raffaelli | R. Basili | M. T. Pazienza | D. Saracino | F. Zanzotto | N. Mana | F. Pianesi | R. Delmonte
Proceedings of the COLING-2000 Workshop on Linguistically Interpreted Corpora

1999

pdf bib
FAME: a Functional Annotation Meta-scheme for multi-modal and multi-lingual Parsing Evaluation
Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli | Claudia Soria
Computer Mediated Language Assessment and Evaluation in Natural Language Processing

1998

pdf bib
Augmenting WordNet-like lexical resources with distributional evidence. An application-oriented perspective
Simonetta Montemagni | Vito Pirrelli
Usage of WordNet in Natural Language Processing Systems

1997

pdf bib
Inferring Semantic Similarity from Distributional Evidence: an Analogy-based Approach to Word Sense Disambiguation
Stefano Federici | Simonetta Montemagni | Vito Pirrelli
Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications

1996

pdf bib
Resolving syntactic ambiguities with lexico-semantic patterns: an analogy-based approach
Simonetta Montemagni | Stefano Federici | Vito Pirrelli
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

1992

pdf bib
Structural Patterns vs. String Patterns for Extracting Semantic Information from Dictionaries
Simonetta Montemagni | Lucy Vanderwende
COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics