Pim Arntzen
2006
Identifying Named Entities in Text Databases from the Natural History Domain
Caroline Sporleder
|
Marieke van Erp
|
Tijn Porcelijn
|
Antal van den Bosch
|
Pim Arntzen
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In this paper, we investigate whether it is possible to bootstrap a named entity tagger for textual databases by exploiting the database structure to automatically generate domain and database-specific gazetteer lists. We compare three tagging strategies: (i) using the extracted gazetteers in a look-up tagger, (ii) using the gazetteers to automatically extract training data to train a database-specific tagger, and (iii) using a generic named entity tagger. Our results suggest that automatically built gazetteers in combination with a look-up tagger lead to a relatively good performance and that generic taggers do not perform particularly well on this type of data.