2018
pdf
bib
The ICoN Corpus of Academic Written Italian (L1 and L2)
Mirko Tavosanis
|
Federica Cominetti
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2012
pdf
bib
abs
Creation of a bottom-up corpus-based ontology for Italian Linguistics
Elisa Bianchi
|
Mirko Tavosanis
|
Emiliano Giovannetti
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper describes the steps of construction of a shallow lexical ontology of Italian Linguistics, set to be used by a meta-search engine for query refinement. The ontology was constructed with the software Protégé 4.0.2 and is in OWL format; its construction has been carried out following the steps described in the well-known Ontology Learning From Text (OLFT) layer cake. The starting point was the automatic term extraction from a corpus of web documents concerning the domain of interest (304,000 words); as regards corpus construction, we describe the main criteria of the web documents selection and its critical points, concerning the definition of user profile and of degrees of specialisation. We describe then the process of term validation and construction of a glossary of terms of Italian Linguistics; afterwards, we outline the identification of synonymic chains and the main criteria of ontology design: top classes of ontology are Concept (containing taxonomy of concepts) and Terms (containing terms of the glossary as instances), while concepts are linked through part-whole and involved-role relation, both borrowed from Wordnet. Finally, we show some examples of the application of the ontology for query refinement.
2008
pdf
bib
abs
Towards a Reference Corpus of Web Genres for the Evaluation of Genre Identification Systems
Georg Rehm
|
Marina Santini
|
Alexander Mehler
|
Pavel Braslavski
|
Rüdiger Gleim
|
Andrea Stubbe
|
Svetlana Symonenko
|
Mirko Tavosanis
|
Vedrana Vidulin
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
We present initial results from an international and multi-disciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary application scenario for which we plan to build this resource is the automatic identification of web genres. Web genres are rather difficult to capture and to describe in their entirety, but we plan for the finished reference corpus to contain multi-level tags of the respective genre or genres a web document or a website instantiates. As the construction of such a corpus is by no means a trivial task, we discuss several alternatives that are, for the time being, mostly based on existing collections. Furthermore, we discuss a shared set of genre categories and a multi-purpose tool as two additional prerequisites for a reference corpus of web genres.
2006
pdf
bib
Linguistic features of Italian blogs: literary language
Mirko Tavosanis
Proceedings of the Workshop on NEW TEXT Wikis and blogs and other dynamic text sources