Creation of a bottom-up corpus-based ontology for Italian Linguistics

Elisa Bianchi, Mirko Tavosanis, Emiliano Giovannetti


Abstract
This paper describes the steps of construction of a shallow lexical ontology of Italian Linguistics, set to be used by a meta-search engine for query refinement. The ontology was constructed with the software Protégé 4.0.2 and is in OWL format; its construction has been carried out following the steps described in the well-known Ontology Learning From Text (OLFT) layer cake. The starting point was the automatic term extraction from a corpus of web documents concerning the domain of interest (304,000 words); as regards corpus construction, we describe the main criteria of the web documents selection and its critical points, concerning the definition of user profile and of degrees of specialisation. We describe then the process of term validation and construction of a glossary of terms of Italian Linguistics; afterwards, we outline the identification of synonymic chains and the main criteria of ontology design: top classes of ontology are Concept (containing taxonomy of concepts) and Terms (containing terms of the glossary as instances), while concepts are linked through part-whole and involved-role relation, both borrowed from Wordnet. Finally, we show some examples of the application of the ontology for query refinement.
Anthology ID:
L12-1426
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2641–2647
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/732_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Elisa Bianchi, Mirko Tavosanis, and Emiliano Giovannetti. 2012. Creation of a bottom-up corpus-based ontology for Italian Linguistics. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2641–2647, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Creation of a bottom-up corpus-based ontology for Italian Linguistics (Bianchi et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/732_Paper.pdf