Efficient Reuse of Structured and Unstructured Resources for Ontology Population

Chetana Gavankar, Ashish Kulkarni, Ganesh Ramakrishnan


Abstract
We study the problem of ontology population for a domain ontology and present solutions based on semi-automatic techniques. A domain ontology for an organization, often consists of classes whose instances are either specific to, or independent of the organization. E.g. in an academic domain ontology, classes like Professor, Department could be organization (university) specific, while Conference, Programming languages are organization independent. This distinction allows us to leverage data sources both―within the organization and those in the Internet ― to extract entities and populate an ontology. We propose techniques that build on those for open domain IE. Together with user input, we show through comprehensive evaluation, how these semi-automatic techniques achieve high precision. We experimented with the academic domain and built an ontology comprising of over 220 classes. Intranet documents from five universities formed our organization specific corpora and we used open domain knowledge bases like Wikipedia, Linked Open Data, and web pages from the Internet as the organization independent data sources. The populated ontology that we built for one of the universities comprised of over 75,000 instances. We adhere to the semantic web standards and tools and make the resources available in the OWL format. These could be useful for applications such as information extraction, text annotation, and information retrieval.
Anthology ID:
L14-1235
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3654–3660
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/251_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Chetana Gavankar, Ashish Kulkarni, and Ganesh Ramakrishnan. 2014. Efficient Reuse of Structured and Unstructured Resources for Ontology Population. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3654–3660, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Efficient Reuse of Structured and Unstructured Resources for Ontology Population (Gavankar et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/251_Paper.pdf