Learning Categories and their Instances by Contextual Features

Antje Schlaf, Robert Remus


Abstract
We present a 3-step framework that learns categories and their instances from natural language text based on given training examples. Step 1 extracts contexts of training examples as rules describing this category from text, considering part of speech, capitalization and category membership as features. Step 2 selects high quality rules using two consequent filters. The first filter is based on the number of rule occurrences, the second filter takes two non-independent characteristics into account: a rule's precision and the amount of instances it acquires. Our framework adapts the filter's threshold values to the respective category and the textual genre by automatically evaluating rule sets resulting from different filter settings and selecting the best performing rule set accordingly. Step 3 then identifies new instances of a category using the filtered rules applied within a previously proposed algorithm. We inspect the rule filters' impact on rule set quality and evaluate our framework by learning first names, last names, professions and cities from a hitherto unexplored textual genre -- search engine result snippets -- and achieve high precision on average.
Anthology ID:
L12-1045
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1235–1239
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/181_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Antje Schlaf and Robert Remus. 2012. Learning Categories and their Instances by Contextual Features. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1235–1239, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Learning Categories and their Instances by Contextual Features (Schlaf & Remus, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/181_Paper.pdf