2010
pdf
bib
Evaluation Metrics for the Lexical Substitution Task
Sanaz Jabbari
|
Mark Hepple
|
Louise Guthrie
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
pdf
bib
abs
Evaluating Lexical Substitution: Analysis and New Measures
Sanaz Jabbari
|
Mark Hepple
|
Louise Guthrie
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Lexical substitution is the task of finding a replacement for a target word in a sentence so as to preserve, as closely as possible, the meaning of the original sentence. It has been proposed that lexical substitution be used as a basis for assessing the performance of word sense disambiguation systems, an idea realised in the English Lexical Substitution Task of SemEval-2007. In this paper, we examine the evaluation metrics used for the English Lexical Substitution Task and identify some problems that arise for them. We go on to propose some alternative measures for this purpose, that avoid these problems, and which in turn can be seen as redefining the key tasks that lexical substitution systems should be expected to perform. We hope that these new metrics will better serve to guide the development of lexical substitution systems in future work. One of the new metrics addresses how effective systems are in ranking substitution candidates, a key ability for lexical substitution systems, and we report some results concerning the assessment of systems produced by this measure as compared to the relevant measure from SemEval-2007.
2008
pdf
bib
abs
Using a Probabilistic Model of Context to Detect Word Obfuscation
Sanaz Jabbari
|
Ben Allison
|
Louise Guthrie
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper proposes a distributional model of word use and word meaning which is derived purely from a body of text, and then applies this model to determine whether certain words are used in or out of context. We suggest that we can view the contexts of words as multinomially distributed random variables. We illustrate how using this basic idea, we can formulate the problem of detecting whether or not a word is used in context as a likelihood ratio test. We also define a measure of semantic relatedness between a word and its context using the same model. We assume that words that typically appear together are related, and thus have similar probability distributions and that words used in an unusual way will have probability distributions which are dissimilar from those of their surrounding context. The relatedness of a word to its context is based on Kullback-Leibler divergence between probability distributions assigned to the constituent words in the given sentence. We employed our methods on a defense-oriented application where certain words are substituted with other words in an intercepted communication.
2006
pdf
bib
Towards the Orwellian Nightmare: Separation of Business and Personal Emails
Sanaz Jabbari
|
Ben Allison
|
David Guthrie
|
Louise Guthrie
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions