Antonella Bristot

2016

ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions
Olga Uryupina | Ron Artstein | Antonella Bristot | Federica Cavicchio | Kepa Rodriguez | Massimo Poesio
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phenomena to include referentiality and genericity and designed and implemented a methodology for enforcing the consistency of the manual annotation. We believe that the new release of ARRAU provides a valuable material for ongoing research in complex cases of coreference as well as for a variety of related tasks. The corpus is publicly available through LDC.

2010

pdf bib abs

Deep Linguistic Processing with GETARUNS for Spoken Dialogue Understanding
Rodolfo Delmonte | Antonella Bristot | Vincenzo Pallotta
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we will present work carried out to scale up the system for text understanding called GETARUNS, and port it to be used in dialogue understanding. The current goal is that of extracting automatically argumentative information in order to build argumentative structure. The long term goal is using argumentative structure to produce automatic summarization of spoken dialogues. Very much like other deep linguistic processing systems, our system is a generic text/dialogue understanding system that can be used in connection with an ontology ― WordNet - and other similar repositories of commonsense knowledge. We will present the adjustments we made in order to cope with transcribed spoken dialogues like those produced in the ICSI Berkeley project. In a final section we present preliminary evaluation of the system on two tasks: the task of automatic argumentative labeling and another frequently addressed task: referential vs. non-referential pronominal detection. Results obtained fair much higher than those reported in similar experiments with machine learning approaches.

2009

pdf bib

Scaling up a NLU system from text to dialogue understanding
Rodolfo Delmonte | Antonella Bristot | Gloria Voltolina | Vincenzo Pallotta
Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)

2008

pdf bib abs

Enriching the Venice Italian Treebank with Dependency and Grammatical Relations
Sara Tonelli | Rodolfo Delmonte | Antonella Bristot
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we propose a rule-based approach to extract dependency and grammatical functions from the Venice Italian Treebank, a Treebank of written text with PoS and constituent labels consisting of 10,200 utterances and about 274,000 tokens. As manual corpus annotation is expensive and time-consuming, we decided to exploit this existing constituency-based Treebank to derive dependency structures with lower effort. After describing the procedure to extract heads and dependents, based on a head percolation table for Italian, we introduce the rules adopted to add grammatical relation labels. To this purpose, we manually relabeled all non-canonical arguments, which are very frequent in Italian, then we automatically labeled the remaining complements or arguments following some syntactic restrictions based on the position of the constituents w.r.t to parent and sibling nodes. The final section of the paper describes evaluation results. Evaluation was carried out in two steps, one for dependency relations and one for grammatical roles. Results are in line with similar conversion algorithms carried out for other languages, with 0.97 precision on dependency arcs and F-measure for the main grammatical functions scoring 0.96 or above, except for obliques with 0.75.