Andrea Glaser


2016

We propose an approach to Named Entity Disambiguation that avoids a problem of standard work on the task (likewise affecting fully supervised, weakly supervised, or distantly supervised machine learning techniques): the treatment of name mentions referring to people with no (or very little) coverage in the textual training data is systematically incorrect. We propose to indirectly take into account the property information for the “non-prominent” name bearers, such as nationality and profession (e.g., for a Canadian law professor named Michael Jackson, with no Wikipedia article, it is very hard to obtain reliable textual training data). The target property information for the entities is directly available from name authority files, or inferrable, e.g., from listings of sportspeople etc. Our proposed approach employs topic modeling to exploit textual training data based on entities sharing the relevant properties. In experiments with a pilot implementation of the general approach, we show that the approach does indeed work well for name/referent pairs with limited textual coverage in the training data.

2014

Identifying the real world entity that a proper name refers to is an important task in many NLP applications. Context plays an important role in disambiguating entities with the same names. In this paper, we discuss a dataset and experimental set-up that allows us to systematically explore the effects of different sizes and types of context in this disambiguation task. We create context by first identifying coreferent expressions in the document and then combining sentences these expressions occur in to one informative context. We apply different filters to obtain different levels of coreference-based context. Since hand-labeling a dataset of a decent size is expensive, we investigate the usefulness of an automatically created pseudo-ambiguity dataset. The results on this pseudo-ambiguity dataset show that using coreference-based context performs better than using a fixed window of context around the entity. The insights taken from the pseudo data experiments can be used to predict how the method works with real data. In our experiments on real data we obtain comparable results.

2012