Dino Ienco
2014
Semantic-Based Multilingual Document Clustering via Tensor Modeling
Salvatore Romeo
|
Andrea Tagarelli
|
Dino Ienco
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
2008
Automatic extraction of subcategorization frames for Italian
Dino Ienco
|
Serena Villata
|
Cristina Bosco
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Subcategorization is a kind of knowledge which can be considered as crucial in several NLP tasks, such as Information Extraction or parsing, but the collection of very large resources including subcategorization representation is difficult and time-consuming. Various experiences show that the automatic extraction can be a practical and reliable solution for acquiring such a kind of knowledge. The aim of this paper is to investigate the relationships between subcategorization frame extraction and the nature of data from which the frames have to be extracted, e.g. how much the task can be influenced by the richness/poorness of the annotation. Therefore, we present some experiments that apply statistical subcategorization extraction methods, known in literature, on an Italian treebank that exploits a rich set of dependency relations that can be annotated at different degrees of specificity. Benefiting from the availability of relation sets that implement different granularity in the representation of relations, we evaluate our results with reference to previous works in a cross-linguistic perspective.