Anna Corazza


pdf bib
A treebank-based study on the influence of Italian word order on parsing performance
Anita Alicante | Cristina Bosco | Anna Corazza | Alberto Lavelli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The aim of this paper is to contribute to the debate on the issues raised by Morphologically Rich Languages, and more precisely to investigate, in a cross-paradigm perspective, the influence of the constituent order on the data-driven parsing of one of such languages(i.e. Italian). It shows therefore new evidence from experiments on Italian, a language characterized by a rich verbal inflection, which leads to a widespread diffusion of the pro―drop phenomenon and to a relatively free word order. The experiments are performed by using state-of-the-art data-driven parsers (i.e. MaltParser and Berkeley parser) and are based on an Italian treebank available in formats that vary according to two dimensions, i.e. the paradigm of representation (dependency vs. constituency) and the level of detail of linguistic information.


pdf bib
Barrier Features for Classification of Semantic Relations
Anita Alicante | Anna Corazza
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011


pdf bib
Comparing Italian parsers on a common Treebank: the EVALITA experience
Cristina Bosco | Alessandro Mazzei | Vincenzo Lombardo | Giuseppe Attardi | Anna Corazza | Alberto Lavelli | Leonardo Lesmo | Giorgio Satta | Maria Simi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The EVALITA 2007 Parsing Task has been the first contest among parsing systems for Italian. It is the first attempt to compare the approaches and the results of the existing parsing systems specific for this language using a common treebank annotated using both a dependency and a constituency-based format. The development data set for this parsing competition was taken from the Turin University Treebank, which is annotated both in dependency and constituency format. The evaluation metrics were those standardly applied in CoNLL and PARSEVAL. The results of the parsing results are very promising and higher than the state-of-the-art for dependency parsing of Italian. An analysis of such results is provided, which takes into account other experiences in treebank-driven parsing for Italian and for other Romance languages (in particular, the CoNLL X & 2007 shared tasks for dependency parsing). It focuses on the characteristics of data sets, i.e. type of annotation and size, parsing paradigms and approaches applied also to languages other than Italian.


pdf bib
Cross-Entropy and Estimation of Probabilistic Context-Free Grammars
Anna Corazza | Giorgio Satta
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference


pdf bib
Parsing Strategies for the Integration of Two Stochastic Context-Free Grammars
Anna Corazza
Proceedings of the Eighth International Conference on Parsing Technologies

Integration of two stochastic context-free grammars can be useful in two pass approaches used, for example, in speech recognition and understanding. Based on an algorithm proposed by [Nederhof and Satta, 2002] for the non-probabilistic case, left-to-right strategies for the search for the best solution based on CKY and Earley parsers are discussed. The restriction that one of the two grammars must be non recursive does not represent a problem in the considered applications.


pdf bib
Stochastic Context-Free Grammars for Island-Driven Probabilistic Parsing
Anna Corazza | Renato De Mori | Roberto Gretter | Giorgio Satta
Proceedings of the Second International Workshop on Parsing Technologies

In automatic speech recognition the use of language models improves performance. Stochastic language models fit rather well the uncertainty created by the acoustic pattern matching. These models are used to score theories corresponding to partial interpretations of sentences. Algorithms have been developed to compute probabilities for theories that grow in a strictly left-to-right fashion. In this paper we consider new relations to compute probabilities of partial interpretations of sentences. We introduce theories containing a gap corresponding to an uninterpreted signal segment. Algorithms can be easily obtained from these relations. Computational complexity of these algorithms is also derived.