Sara Tonelli


2021

pdf bib
Fine-Grained Fairness Analysis of Abusive Language Detection Systems with CheckList
Marta Marchiori Manerba | Sara Tonelli
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

Current abusive language detection systems have demonstrated unintended bias towards sensitive features such as nationality or gender. This is a crucial issue, which may harm minorities and underrepresented groups if such systems were integrated in real-world applications. In this paper, we create ad hoc tests through the CheckList tool (Ribeiro et al., 2020) to detect biases within abusive language classifiers for English. We compare the behaviour of two BERT-based models, one trained on a generic hate speech dataset and the other on a dataset for misogyny detection. Our evaluation shows that, although BERT-based classifiers achieve high accuracy levels on a variety of natural language processing tasks, they perform very poorly as regards fairness and bias, in particular on samples involving implicit stereotypes, expressions of hate towards minorities and protected attributes such as race or sexual orientation. We release both the notebooks implemented to extend the Fairness tests and the synthetic datasets usable to evaluate systems bias independently of CheckList.

pdf bib
Challenges in Designing Games with a Purpose for Abusive Language Annotation
Federico Bonetti | Sara Tonelli
Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing

In this paper we discuss several challenges related to the development of a 3D game, whose goal is to raise awareness on cyberbullying while collecting linguistic annotation on offensive language. The game is meant to be used by teenagers, thus raising a number of issues that need to be tackled during development. For example, the game aesthetics should be appealing for players belonging to this age group, but at the same time all possible solutions should be implemented to meet privacy requirements. Also, the task of linguistic annotation should be possibly hidden, adopting so-called orthogonal game mechanics, without affecting the quality of collected data. While some of these challenges are being tackled in the game development, some others are discussed in this paper but still lack an ultimate solution.

2020

pdf bib
Adding Gesture, Posture and Facial Displays to the PoliModal Corpus of Political Interviews
Daniela Trotta | Alessio Palmero Aprosio | Sara Tonelli | Annibale Elia
Proceedings of the 12th Language Resources and Evaluation Conference

This paper introduces a multimodal corpus in the political domain, which on top of transcribed face-to-face interviews presents the annotation of facial displays, hand gestures and body posture. While the fully annotated corpus consists of 3 interviews for a total of 90 minutes, it is extracted from a larger available corpus of 56 face-to-face interviews (14 hours) that has been manually annotated with information about metadata (i.e. tools used for the transcription, link to the interview etc.), pauses (used to mark a pause either between or within utterances), vocal expressions (marking non-lexical expressions such as burp and semi-lexical expressions such as primary interjections), deletions (false starts, repetitions and truncated words) and overlaps. In this work, we describe the additional level of annotation relating to nonverbal elements used by three Italian politicians belonging to three different political parties and who at the time of the talk-show were all candidates for the presidency of the Council of Minister. We also present the results of some analyses aimed at identifying existing relations between the proxemics phenomena and the linguistic structures in which they occur in order to capture recurring patterns and differences in the communication strategy.

pdf bib
Hybrid Emoji-Based Masked Language Models for Zero-Shot Abusive Language Detection
Michele Corazza | Stefano Menini | Elena Cabrio | Sara Tonelli | Serena Villata
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent studies have demonstrated the effectiveness of cross-lingual language model pre-training on different NLP tasks, such as natural language inference and machine translation. In our work, we test this approach on social media data, which are particularly challenging to process within this framework, since the limited length of the textual messages and the irregularity of the language make it harder to learn meaningful encodings. More specifically, we propose a hybrid emoji-based Masked Language Model (MLM) to leverage the common information conveyed by emojis across different languages and improve the learned cross-lingual representation of short text messages, with the goal to perform zero- shot abusive language detection. We compare the results obtained with the original MLM to the ones obtained by our method, showing improved performance on German, Italian and Spanish.

pdf bib
A 3D Role-Playing Game for Abusive Language Annotation
Federico Bonetti | Sara Tonelli
Workshop on Games and Natural Language Processing

Gamification has been applied to many linguistic annotation tasks, as an alternative to crowdsourcing platforms to collect annotated data in an inexpensive way. However, we think that still much has to be explored. Games with a Purpose (GWAPs) tend to lack important elements that we commonly see in commercial games, such as 2D and 3D worlds or a story. Making GWAPs more similar to full-fledged video games in order to involve users more easily and increase dissemination is a demanding yet interesting ground to explore. In this paper we present a 3D role-playing game for abusive language annotation that is currently under development.

pdf bib
FBK-DH at SemEval-2020 Task 12: Using Multi-channel BERT for Multilingual Offensive Language Detection
Camilla Casula | Alessio Palmero Aprosio | Stefano Menini | Sara Tonelli
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper we present our submission to sub-task A at SemEval 2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval2). For Danish, Turkish, Arabic and Greek, we develop an architecture based on transfer learning and relying on a two-channel BERT model, in which the English BERT and the multilingual one are combined after creating a machine-translated parallel corpus for each language in the task. For English, instead, we adopt a more standard, single-channel approach. We find that, in a multilingual scenario, with some languages having small training data, using parallel BERT models with machine translated data can give systems more stability, especially when dealing with noisy data. The fact that machine translation on social media data may not be perfect does not hurt the overall classification performance.

pdf bib
Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language
Johanna Monti | Valerio Basile | Maria Pia Di Buono | Raffaele Manna | Antonio Pascucci | Sara Tonelli
Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language

2019

pdf bib
Novel Event Detection and Classification for Historical Texts
Rachele Sprugnoli | Sara Tonelli
Computational Linguistics, Volume 45, Issue 2 - June 2019

Event processing is an active area of research in the Natural Language Processing community, but resources and automatic systems developed so far have mainly addressed contemporary texts. However, the recognition and elaboration of events is a crucial step when dealing with historical texts Particularly in the current era of massive digitization of historical sources: Research in this domain can lead to the development of methodologies and tools that can assist historians in enhancing their work, while having an impact also on the field of Natural Language Processing. Our work aims at shedding light on the complex concept of events when dealing with historical texts. More specifically, we introduce new annotation guidelines for event mentions and types, categorized into 22 classes. Then, we annotate a historical corpus accordingly, and compare two approaches for automatic event detection and classification following this novel scheme. We believe that this work can foster research in a field of inquiry as yet underestimated in the area of Temporal Information Processing. To this end, we release new annotation guidelines, a corpus, and new models for automatic annotation.

pdf bib
Neural Text Simplification in Low-Resource Conditions Using Weak Supervision
Alessio Palmero Aprosio | Sara Tonelli | Marco Turchi | Matteo Negri | Mattia A. Di Gangi
Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation

Neural text simplification has gained increasing attention in the NLP community thanks to recent advancements in deep sequence-to-sequence learning. Most recent efforts with such a data-demanding paradigm have dealt with the English language, for which sizeable training datasets are currently available to deploy competitive models. Similar improvements on less resource-rich languages are conditioned either to intensive manual work to create training data, or to the design of effective automatic generation techniques to bypass the data acquisition bottleneck. Inspired by the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements to neural models, in this paper we exploit large amounts of heterogeneous data to automatically select simple sentences, which are then used to create synthetic simplification pairs. We also evaluate other solutions, such as oversampling and the use of external word embeddings to be fed to the neural simplification system. Our approach is evaluated on Italian and Spanish, for which few thousand gold sentence pairs are available. The results show that these techniques yield performance improvements over a baseline sequence-to-sequence configuration.

pdf bib
A System to Monitor Cyberbullying based on Message Classification and Social Network Analysis
Stefano Menini | Giovanni Moretti | Michele Corazza | Elena Cabrio | Sara Tonelli | Serena Villata
Proceedings of the Third Workshop on Abusive Language Online

Social media platforms like Twitter and Instagram face a surge in cyberbullying phenomena against young users and need to develop scalable computational methods to limit the negative consequences of this kind of abuse. Despite the number of approaches recently proposed in the Natural Language Processing (NLP) research area for detecting different forms of abusive language, the issue of identifying cyberbullying phenomena at scale is still an unsolved problem. This is because of the need to couple abusive language detection on textual message with network analysis, so that repeated attacks against the same person can be identified. In this paper, we present a system to monitor cyberbullying phenomena by combining message classification and social network analysis. We evaluate the classification module on a data set built on Instagram messages, and we describe the cyberbullying monitoring user interface.

2018

pdf bib
Creating a WhatsApp Dataset to Study Pre-teen Cyberbullying
Rachele Sprugnoli | Stefano Menini | Sara Tonelli | Filippo Oncini | Enrico Piras
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)

Although WhatsApp is used by teenagers as one major channel of cyberbullying, such interactions remain invisible due to the app privacy policies that do not allow ex-post data collection. Indeed, most of the information on these phenomena rely on surveys regarding self-reported data. In order to overcome this limitation, we describe in this paper the activities that led to the creation of a WhatsApp dataset to study cyberbullying among Italian students aged 12-13. We present not only the collected chats with annotations about user role and type of offense, but also the living lab created in a collaboration between researchers and schools to monitor and analyse cyberbullying. Finally, we discuss some open issues, dealing with ethical, operational and epistemic aspects.

2017

pdf bib
Building timelines of soccer matches from Twitter
Amosse Edouard | Elena Cabrio | Sara Tonelli | Nhan Le-Thanh
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

This demo paper presents a system that builds a timeline with salient actions of a soccer game, based on the tweets posted by users. It combines information provided by external knowledge bases to enrich the content of tweets and applies graph theory to model relations between actions (e.g. goals, penalties) and participants of a game (e.g. players, teams). In the demo, a web application displays in nearly real-time the actions detected from tweets posted by users for a given match of Euro 2016. Our tools are freely available at https://bitbucket.org/eamosse/event_tracking.

pdf bib
You’ll Never Tweet Alone: Building Sports Match Timelines from Microblog Posts
Amosse Edouard | Elena Cabrio | Sara Tonelli | Nhan Le-Thanh
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In this paper, we propose an approach to build a timeline with actions in a sports game based on tweets. We combine information provided by external knowledge bases to enrich the content of the tweets, and apply graph theory to model relations between actions and participants in a game. We demonstrate the validity of our approach using tweets collected during the EURO 2016 Championship and evaluate the output against live summaries produced by sports channels.

pdf bib
Graph-based Event Extraction from Twitter
Amosse Edouard | Elena Cabrio | Sara Tonelli | Nhan Le-Thanh
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Detecting which tweets describe a specific event and clustering them is one of the main challenging tasks related to Social Media currently addressed in the NLP community. Existing approaches have mainly focused on detecting spikes in clusters around specific keywords or Named Entities (NE). However, one of the main drawbacks of such approaches is the difficulty in understanding when the same keywords describe different events. In this paper, we propose a novel approach that exploits NE mentions in tweets and their entity context to create a temporal event graph. Then, using simple graph theory techniques and a PageRank-like algorithm, we process the event graphs to detect clusters of tweets describing the same events. Experiments on two gold standard datasets show that our approach achieves state-of-the-art results both in terms of evaluation performances and the quality of the detected events.

pdf bib
Topic-Based Agreement and Disagreement in US Electoral Manifestos
Stefano Menini | Federico Nanni | Simone Paolo Ponzetto | Sara Tonelli
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a topic-based analysis of agreement and disagreement in political manifestos, which relies on a new method for topic detection based on key concept clustering. Our approach outperforms both standard techniques like LDA and a state-of-the-art graph-based method, and provides promising initial results for this new task in computational social science.

pdf bib
MUSST: A Multilingual Syntactic Simplification Tool
Carolina Scarton | Alessio Palmero Aprosio | Sara Tonelli | Tamara Martín Wanton | Lucia Specia
Proceedings of the IJCNLP 2017, System Demonstrations

We describe MUSST, a multilingual syntactic simplification tool. The tool supports sentence simplifications for English, Italian and Spanish, and can be easily extended to other languages. Our implementation includes a set of general-purpose simplification rules, as well as a sentence selection module (to select sentences to be simplified) and a confidence model (to select only promising simplifications). The tool was implemented in the context of the European project SIMPATICO on text simplification for Public Administration (PA) texts. Our evaluation on sentences in the PA domain shows that we obtain correct simplifications for 76% of the simplified cases in English, 71% of the cases in Spanish. For Italian, the results are lower (38%) but the tool is still under development.

pdf bib
The Content Types Dataset: a New Resource to Explore Semantic and Functional Characteristics of Texts
Rachele Sprugnoli | Tommaso Caselli | Sara Tonelli | Giovanni Moretti
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper presents a new resource, called Content Types Dataset, to promote the analysis of texts as a composition of units with specific semantic and functional roles. By developing this dataset, we also introduce a new NLP task for the automatic classification of Content Types. The annotation scheme and the dataset are described together with two sets of classification experiments.

pdf bib
RAMBLE ON: Tracing Movements of Popular Historical Figures
Stefano Menini | Rachele Sprugnoli | Giovanni Moretti | Enrico Bignotti | Sara Tonelli | Bruno Lepri
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We present RAMBLE ON, an application integrating a pipeline for frame-based information extraction and an interface to track and display movement trajectories. The code of the extraction pipeline and a navigator are freely available; moreover we display in a demonstrator the outcome of a case study carried out on trajectories of notable persons of the XX Century.

2016

pdf bib
NLP and Public Engagement: The Case of the Italian School Reform
Tommaso Caselli | Giovanni Moretti | Rachele Sprugnoli | Sara Tonelli | Damien Lanfrey | Donatella Solda Kutzmann
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we present PIERINO (PIattaforma per l’Estrazione e il Recupero di INformazione Online), a system that was implemented in collaboration with the Italian Ministry of Education, University and Research to analyse the citizens’ comments given in #labuonascuola survey. The platform includes various levels of automatic analysis such as key-concept extraction and word co-occurrences. Each analysis is displayed through an intuitive view using different types of visualizations, for example radar charts and sunburst. PIERINO was effectively used to support shaping the last Italian school reform, proving the potential of NLP in the context of policy making.

pdf bib
PreMOn: a Lemon Extension for Exposing Predicate Models as Linked Data
Francesco Corcoglioniti | Marco Rospocher | Alessio Palmero Aprosio | Sara Tonelli
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We introduce PreMOn (predicate model for ontologies), a linguistic resource for exposing predicate models (PropBank, NomBank, VerbNet, and FrameNet) and mappings between them (e.g, SemLink) as Linked Open Data. It consists of two components: (i) the PreMOn Ontology, an extension of the lemon model by the W3C Ontology-Lexica Community Group, that enables to homogeneously represent data from the various predicate models; and, (ii) the PreMOn Dataset, a collection of RDF datasets integrating various versions of the aforementioned predicate models and mapping resources. PreMOn is freely available and accessible online in different ways, including through a dedicated SPARQL endpoint.

pdf bib
CATENA: CAusal and TEmporal relation extraction from NAtural language texts
Paramita Mirza | Sara Tonelli
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We present CATENA, a sieve-based system to perform temporal and causal relation extraction and classification from English texts, exploiting the interaction between the temporal and the causal model. We evaluate the performance of each sieve, showing that the rule-based, the machine-learned and the reasoning components all contribute to achieving state-of-the-art performance on TempEval-3 and TimeBank-Dense data. Although causal relations are much sparser than temporal ones, the architecture and the selected features are mostly suitable to serve both tasks. The effects of the interaction between the temporal and the causal components, although limited, yield promising results and confirm the tight connection between the temporal and the causal dimension of texts.

pdf bib
Agreement and Disagreement: Comparison of Points of View in the Political Domain
Stefano Menini | Sara Tonelli
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

The automated comparison of points of view between two politicians is a very challenging task, due not only to the lack of annotated resources, but also to the different dimensions participating to the definition of agreement and disagreement. In order to shed light on this complex task, we first carry out a pilot study to manually annotate the components involved in detecting agreement and disagreement. Then, based on these findings, we implement different features to capture them automatically via supervised classification. We do not focus on debates in dialogical form, but we rather consider sets of documents, in which politicians may express their position with respect to different topics in an implicit or explicit way, like during an electoral campaign. We create and make available three different datasets.

pdf bib
On the contribution of word embeddings to temporal relation classification
Paramita Mirza | Sara Tonelli
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Temporal relation classification is a challenging task, especially when there are no explicit markers to characterise the relation between temporal entities. This occurs frequently in inter-sentential relations, whose entities are not connected via direct syntactic relations making classification even more difficult. In these cases, resorting to features that focus on the semantic content of the event words may be very beneficial for inferring implicit relations. Specifically, while morpho-syntactic and context features are considered sufficient for classifying event-timex pairs, we believe that exploiting distributional semantic information about event words can benefit supervised classification of other types of pairs. In this work, we assess the impact of using word embeddings as features for event words in classifying temporal relations of event-event pairs and event-DCT (document creation time) pairs.

2015

pdf bib
Recognizing Biographical Sections in Wikipedia
Alessio Palmero Aprosio | Sara Tonelli
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Classifying Temporal Relations with Simple Features
Paramita Mirza | Sara Tonelli
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Annotating Causality in the TempEval-3 Corpus
Paramita Mirza | Rachele Sprugnoli | Sara Tonelli | Manuela Speranza
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)

pdf bib
An Analysis of Causality between Events and its Relation to Temporal Information
Paramita Mirza | Sara Tonelli
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
CROMER: a Tool for Cross-Document Event and Entity Coreference
Christian Girardi | Manuela Speranza | Rachele Sprugnoli | Sara Tonelli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present CROMER (CROss-document Main Events and entities Recognition), a novel tool to manually annotate event and entity coreference across clusters of documents. The tool has been developed so as to handle large collections of documents, perform collaborative annotation (several annotators can work on the same clusters), and enable the linking of the annotated data to external knowledge sources. Given the availability of semantic information encoded in Semantic Web resources, this tool is designed to support annotators in linking entities and events to DBPedia and Wikipedia, so as to facilitate the automatic retrieval of additional semantic information. In this way, event modelling and chaining is made easy, while guaranteeing the highest interconnection with external resources. For example, the tool can be easily linked to event models such as the Simple Event Model [Van Hage et al , 2011] and the Grounded Annotation Framework [Fokkens et al. 2013].

2013

pdf bib
GAF: A Grounded Annotation Framework for Events
Antske Fokkens | Marieke van Erp | Piek Vossen | Sara Tonelli | Willem Robert van Hage | Luciano Serafini | Rachele Sprugnoli | Jesper Hoeksema
Workshop on Events: Definition, Detection, Coreference, and Representation

pdf bib
Outsourcing FrameNet to the Crowd
Marco Fossati | Claudio Giuliano | Sara Tonelli
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Mining Fine-grained Opinion Expressions with Shallow Parsing
Sucheta Ghosh | Sara Tonelli | Richard Johansson
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
FBK: Sentiment Analysis in Twitter with Tweetsted
Md. Faisal Mahbub Chowdhury | Marco Guerini | Sara Tonelli | Alberto Lavelli
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
Key-concept extraction from French articles with KX
Sara Tonelli | Elena Cabrio | Emanuele Pianta
JEP-TALN-RECITAL 2012, Workshop DEFT 2012: DÉfi Fouille de Textes (DEFT 2012 Workshop: Text Mining Challenge)

pdf bib
Making Readability Indices Readable
Sara Tonelli | Ke Tran Manh | Emanuele Pianta
Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations

pdf bib
Improving the Recall of a Discourse Parser by Constraint-based Postprocessing
Sucheta Ghosh | Richard Johansson | Giuseppe Riccardi | Sara Tonelli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe two constraint-based methods that can be used to improve the recall of a shallow discourse parser based on conditional random field chunking. These method uses a set of natural structural constraints as well as others that follow from the annotation guidelines of the Penn Discourse Treebank. We evaluated the resulting systems on the standard test set of the PDTB and achieved a rebalancing of precision and recall with improved F-measures across the board. This was especially notable when we used evaluation metrics taking partial matches into account; for these measures, we achieved F-measure improvements of several points.

pdf bib
Hunting for Entailing Pairs in the Penn Discourse Treebank
Sara Tonelli | Elena Cabrio
Proceedings of COLING 2012

2011

pdf bib
Desperately Seeking Implicit Arguments in Text
Sara Tonelli | Rodolfo Delmonte
Proceedings of the ACL 2011 Workshop on Relational Models of Semantics

pdf bib
Shallow Discourse Parsing with Conditional Random Fields
Sucheta Ghosh | Richard Johansson | Giuseppe Riccardi | Sara Tonelli
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
KX: A Flexible System for Keyphrase eXtraction
Emanuele Pianta | Sara Tonelli
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
VENSES++: Adapting a deep semantic processing system to the identification of null instantiations
Sara Tonelli | Rodolfo Delmonte
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Annotation of Discourse Relations for Conversational Spoken Dialogs
Sara Tonelli | Giuseppe Riccardi | Rashmi Prasad | Aravind Joshi
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we make a qualitative and quantitative analysis of discourse relations within the LUNA conversational spoken dialog corpus. In particular, we first describe the Penn Discourse Treebank (PDTB) and then we detail the adaptation of its annotation scheme to the LUNA corpus of Italian task-oriented dialogs in the domain of software/hardware assistance. We discuss similarities and differences between our approach and the PDTB paradigm and point out the peculiarities of spontaneous dialogs w.r.t. written text, which motivated some changes in the annotation strategy. In particular, we introduced the annotation of relations between non-contiguous arguments and we modified the sense hierarchy in order to take into account the important role of pragmatics in dialogs. In the final part of the paper, we present a comparison between the sense and connective frequency in a representative subset of the LUNA corpus and in the PDTB. Such analysis confirmed the differences between the two corpora and corroborates our choice to introduce dialog-specific adaptations.

pdf bib
VenPro: A Morphological Analyzer for Venetan
Sara Tonelli | Emanuele Pianta | Rodolfo Delmonte | Michele Brunelli
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This document reports the process of extending MorphoPro for Venetan, a lesser-used language spoken in the Nort-Eastern part of Italy. MorphoPro is the morphological component of TextPro, a suite of tools oriented towards a number of NLP tasks. In order to extend this component to Venetan, we developed a declarative representation of the morphological knowledge necessary to analyze and synthesize Venetan words. This task was challenging for several reasons, which are common to a number of lesser-used languages: although Venetan is widely used as an oral language in everyday life, its written usage is very limited; efforts for defining a standard orthography and grammar are very recent and not well established; despite recent attempts to propose a unified orthography, no Venetan standard is widely used. Besides, there are different geographical varieties and it is strongly influenced by Italian.

2009

pdf bib
Wikipedia as Frame Information Repository
Sara Tonelli | Claudio Giuliano
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Annotating Spoken Dialogs: From Speech Segments to Dialog Acts and Frame Semantics
Marco Dinarelli | Silvia Quarteroni | Sara Tonelli | Alessandro Moschitti | Giuseppe Riccardi
Proceedings of SRSL 2009, the 2nd Workshop on Semantic Representation of Spoken Language

pdf bib
New Features for FrameNet - WordNet Mapping
Sara Tonelli | Daniele Pighin
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

pdf bib
A novel approach to mapping FrameNet lexical units to WordNet synsets (short paper)
Sara Tonelli | Emanuele Pianta
Proceedings of the Eight International Conference on Computational Semantics

pdf bib
Three Issues in Cross-Language Frame Information Transfer
Sara Tonelli | Emanuele Pianta
Proceedings of the International Conference RANLP-2009

2008

pdf bib
Enriching the Venice Italian Treebank with Dependency and Grammatical Relations
Sara Tonelli | Rodolfo Delmonte | Antonella Bristot
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we propose a rule-based approach to extract dependency and grammatical functions from the Venice Italian Treebank, a Treebank of written text with PoS and constituent labels consisting of 10,200 utterances and about 274,000 tokens. As manual corpus annotation is expensive and time-consuming, we decided to exploit this existing constituency-based Treebank to derive dependency structures with lower effort. After describing the procedure to extract heads and dependents, based on a head percolation table for Italian, we introduce the rules adopted to add grammatical relation labels. To this purpose, we manually relabeled all non-canonical arguments, which are very frequent in Italian, then we automatically labeled the remaining complements or arguments following some syntactic restrictions based on the position of the constituents w.r.t to parent and sibling nodes. The final section of the paper describes evaluation results. Evaluation was carried out in two steps, one for dependency relations and one for grammatical roles. Results are in line with similar conversion algorithms carried out for other languages, with 0.97 precision on dependency arcs and F-measure for the main grammatical functions scoring 0.96 or above, except for obliques with 0.75.

pdf bib
Frame Information Transfer from English to Italian
Sara Tonelli | Emanuele Pianta
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe an automatic projection algorithm for transferring frame-semantic information from English to Italian texts as a first sep towards the creation of Italian FrameNet. Given an English text with frame information and its Italian translation, we project the annotation in four steps: first the Italian text is parsed, then English-Italian alignment is automatically carried out at word level, then we extract the semantic head for every annotated constituent on the English corpus side and finally we project annotation from English to Italian using aligned semantic heads as bridge. With our work, we point out typical features of the Italian language as regards frame-semantic annotation, in particular we describe peculiarities of Italian that at the moment make the projection task more difficult than in the above-mentioned examples. Besides, we created a gold standard with 987 manually annotated sentences to evaluate the algorithm.

2007

pdf bib
Entailment and Anaphora Resolution in RTE3
Rodolfo Delmonte | Antonella Bristot | Marco Aldo Piccolino Boniforti | Sara Tonelli
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

pdf bib
IRST-BP: Preposition Disambiguation based on Chain Clarifying Relationships Contexts
Octavian Popescu | Sara Tonelli | Emanuele Pianta
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf bib
Another Evaluation of Anaphora Resolution Algorithms and a Comparison with GETARUNS’ Knowledge Rich Approach
Rodolfo Delmonte | Antonella Bristot | Marco Aldo Piccolino Boniforti | Sara Tonelli
Proceedings of the Workshop on ROMAND 2006:Robust Methods in Analysis of Natural language Data