Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web

Giuseppe Rizzo, Marieke van Erp, Raphaël Troncy


Abstract
Named entity recognition and disambiguation are of primary importance for extracting information and for populating knowledge bases. Detecting and classifying named entities has traditionally been taken on by the natural language processing community, whilst linking of entities to external resources, such as those in DBpedia, has been tackled by the Semantic Web community. As these tasks are treated in different communities, there is as yet no oversight on the performance of these tasks combined. We present an approach that combines the state-of-the art from named entity recognition in the natural language processing domain and named entity linking from the semantic web community. We report on experiments and results to gain more insights into the strengths and limitations of current approaches on these tasks. Our approach relies on the numerous web extractors supported by the NERD framework, which we combine with a machine learning algorithm to optimize recognition and linking of named entities. We test our approach on four standard data sets that are composed of two diverse text types, namely newswire and microposts.
Anthology ID:
L14-1185
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4593–4600
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/176_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Giuseppe Rizzo, Marieke van Erp, and Raphaël Troncy. 2014. Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 4593–4600, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web (Rizzo et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/176_Paper.pdf