Semi-automatic Parsing for Web Knowledge Extraction through Semantic Annotation

Maria Pia di Buono


Abstract
Parsing Web information, namely parsing content to find relevant documents on the basis of a user’s query, represents a crucial step to guarantee fast and accurate Information Retrieval (IR). Generally, an automated approach to such task is considered faster and cheaper than manual systems. Nevertheless, results do not seem have a high level of accuracy, indeed, as also Hjorland (2007) states, using stochastic algorithms entails: • Low precision due to the indexing of common Atomic Linguistic Units (ALUs) or sentences. • Low recall caused by the presence of synonyms. • Generic results arising from the use of too broad or too narrow terms. Usually IR systems are based on invert text index, namely an index data structure storing a mapping from content to its locations in a database file, or in a document or a set of documents. In this paper we propose a system, by means of which we will develop a search engine able to process online documents, starting from a natural language query, and to return information to users. The proposed approach, based on the Lexicon-Grammar (LG) framework and its language formalization methodologies, aims at integrating a semantic annotation process for both query analysis and document retrieval.
Anthology ID:
L16-1113
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
714–717
Language:
URL:
https://aclanthology.org/L16-1113
DOI:
Bibkey:
Cite (ACL):
Maria Pia di Buono. 2016. Semi-automatic Parsing for Web Knowledge Extraction through Semantic Annotation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 714–717, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Semi-automatic Parsing for Web Knowledge Extraction through Semantic Annotation (di Buono, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1113.pdf