Dan Hunter
Also published as: D. Hunter
2010
Evaluation of Document Citations in Phase 2 Gale Distillation
Olga Babko-Malaya
|
Dan Hunter
|
Connie Fournelle
|
Jim White
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
The focus of information retrieval evaluations, such as NISTs TREC evaluations (e.g. Voorhees 2003), is on evaluation of the information content of system responses. On the other hand, retrieval tasks usually involve two different dimensions: reporting relevant information and providing sources of information, including corroborating evidence and alternative documents. Under the DARPA Global Autonomous Language Exploitation (GALE) program, Distillation provides succinct, direct responses to the formatted queries using the outputs of automated transcription and translation technologies. These responses are evaluated in two dimensions: information content, which measures the amount of relevant and non-redundant information, and document support, which measures the number of alternative sources provided in support of reported information. The final metric in the overall GALE distillation evaluation combines the results of scoring of both query responses and document citations. In this paper, we describe our evaluation framework with emphasis on the scoring of document citations and an analysis of how systems perform at providing sources of information.
2008
Statistical Evaluation of Information Distillation Systems
J.V. White
|
D. Hunter
|
J.D. Goldstein
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
We describe a methodology for evaluating the statistical performance of information distillation systems and apply it to a simple illustrative example. (An information distiller provides written English responses to English queries based on automated searches/transcriptions/translations of English and foreign-language sources. The sources include written documents and sound tracks.) The evaluation methodology extracts information nuggets from the distiller response texts and gathers them into fuzzy equivalence classes called nugs. Themethodology supports the usual performancemetrics, such as recall and precision, as well as a new information-theoretic metric called proficiency, which measures how much information a distiller provides relative to all of the information provided by a collection of distillers working on a common query and corpora. Unlike previous evaluation techniques, the methodology evaluates the relevance, granularity, and redundancy of information nuggets explicitly.