Andrea Glaser


2016

pdf bib
Named Entity Disambiguation for little known referents: a topic-based approach
Andrea Glaser | Jonas Kuhn
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We propose an approach to Named Entity Disambiguation that avoids a problem of standard work on the task (likewise affecting fully supervised, weakly supervised, or distantly supervised machine learning techniques): the treatment of name mentions referring to people with no (or very little) coverage in the textual training data is systematically incorrect. We propose to indirectly take into account the property information for the “non-prominent” name bearers, such as nationality and profession (e.g., for a Canadian law professor named Michael Jackson, with no Wikipedia article, it is very hard to obtain reliable textual training data). The target property information for the entities is directly available from name authority files, or inferrable, e.g., from listings of sportspeople etc. Our proposed approach employs topic modeling to exploit textual training data based on entities sharing the relevant properties. In experiments with a pilot implementation of the general approach, we show that the approach does indeed work well for name/referent pairs with limited textual coverage in the training data.

2014

pdf bib
Exploring the utility of coreference chains for improved identification of personal names
Andrea Glaser | Jonas Kuhn
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Identifying the real world entity that a proper name refers to is an important task in many NLP applications. Context plays an important role in disambiguating entities with the same names. In this paper, we discuss a dataset and experimental set-up that allows us to systematically explore the effects of different sizes and types of context in this disambiguation task. We create context by first identifying coreferent expressions in the document and then combining sentences these expressions occur in to one informative context. We apply different filters to obtain different levels of coreference-based context. Since hand-labeling a dataset of a decent size is expensive, we investigate the usefulness of an automatically created pseudo-ambiguity dataset. The results on this pseudo-ambiguity dataset show that using coreference-based context performs better than using a fixed window of context around the entity. The insights taken from the pseudo data experiments can be used to predict how the method works with real data. In our experiments on real data we obtain comparable results.

2012

pdf bib
Automatic generation of short informative sentiment summaries
Andrea Glaser | Hinrich Schütze
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics