Extraction, Merging, and Monitoring of Company Data from Heterogeneous Sources

Christian Federmann, Thierry Declerck


Abstract
We describe the implementation of an enterprise monitoring system that builds on an ontology-based information extraction (OBIE) component applied to heterogeneous data sources. The OBIE component consists of several IE modules - each extracting on a regular temporal basis a specific fraction of company data from a given data source - and a merging tool, which is used to aggregate all the extracted information about a company. The full set of information about companies, which is to be extracted and merged by the OBIE component, is given in the schema of a domain ontology, which is guiding the information extraction process. The monitoring system, in case it detects changes in the extracted and merged information on a company with respect to the actual state of the knowledge base of the underlying ontology, ensures the update of the population of the ontology. As we are using an ontology extended with temporal information, the system is able to assign time intervals to any of the object instances. Additionally, detected changes can be communicated to end-users, who can validate and possibly correct the resulting updates in the knowledge base.
Anthology ID:
L10-1582
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/842_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Christian Federmann and Thierry Declerck. 2010. Extraction, Merging, and Monitoring of Company Data from Heterogeneous Sources. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Extraction, Merging, and Monitoring of Company Data from Heterogeneous Sources (Federmann & Declerck, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/842_Paper.pdf