A Domain and Language Independent Named Entity Classification Approach Based on Profiles and Local Information

Isabel Moreno, María Teresa Romá-Ferri, Paloma Moreda Pozo


Abstract
This paper presents a Named Entity Classification system, which employs machine learning. Our methodology employs local entity information and profiles as feature set. All features are generated in an unsupervised manner. It is tested on two different data sets: (i) DrugSemantics Spanish corpus (Overall F1 = 74.92), whose results are in-line with the state of the art without employing external domain-specific resources. And, (ii) English CONLL2003 dataset (Overall F1 = 81.40), although our results are lower than previous work, these are reached without external knowledge or complex linguistic analysis. Last, using the same configuration for the two corpora, the difference of overall F1 is only 6.48 points (DrugSemantics = 74.92 versus CoNLL2003 = 81.40). Thus, this result supports our hypothesis that our approach is language and domain independent and does not require any external knowledge or complex linguistic analysis.
Anthology ID:
R17-1067
Volume:
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
510–518
Language:
URL:
https://doi.org/10.26615/978-954-452-049-6_067
DOI:
10.26615/978-954-452-049-6_067
Bibkey:
Cite (ACL):
Isabel Moreno, María Teresa Romá-Ferri, and Paloma Moreda Pozo. 2017. A Domain and Language Independent Named Entity Classification Approach Based on Profiles and Local Information. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 510–518, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
A Domain and Language Independent Named Entity Classification Approach Based on Profiles and Local Information (Moreno et al., RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-049-6_067