Low-Complexity Heuristics for Deriving Fine-Grained Classes of Named Entities from Web Textual Data

Marius Paşca


Abstract
We introduce a low-complexity method for acquiring fine-grained classes of named entities from the Web. The method exploits the large amounts of textual data available on the Web, while avoiding the use of any expensive text processing techniques or tools. The quality of the extracted classes is encouraging with respect to both the precision of the sets of named entities acquired within various classes, and the labels assigned to the sets of named entities.
Anthology ID:
L08-1056
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/886_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Marius Paşca. 2008. Low-Complexity Heuristics for Deriving Fine-Grained Classes of Named Entities from Web Textual Data. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Low-Complexity Heuristics for Deriving Fine-Grained Classes of Named Entities from Web Textual Data (Paşca, LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/886_paper.pdf