2006
pdf
bib
abs
A Model for Context-Based Evaluation of Language Processing Systems and its Application to Machine Translation Evaluation
Andrei Popescu-Belis
|
Paula Estrella
|
Margaret King
|
Nancy Underwood
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In this paper, we propose a formal framework that takes into account the influence of the intended context of use of an NLP system on the procedure and the metrics used to evaluate the system. We introduce in particular the notion of a context-dependent quality model and explain how it can be adapted to a given context of use. More specifically, we define vector-space representations of contexts of use and of quality models, which are connected by a generic contextual quality model (GCQM). For each domain, experts in evaluation are needed to build a GCQM based on analytic knowledge and on previous evaluations, using the mechanism proposed here. The main inspiration source for this work is the FEMTI framework for the evaluation of machine translation, which implements partly the present model, and which is described briefly along with insights from other domains.
pdf
bib
abs
Evaluating Symbiotic Systems: the challenge
Margaret King
|
Nancy Underwood
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper looks at a class of systems which pose severe problems in evaluation design for current conventional approaches to evaluation. After describing the two conventional evaluation paradigms: the functionality paradigm as typified by evaluation campaigns and the ISO inspired user-centred paradigm typified by the work of the EAGLES and ISLE projects, it goes on to outline the problems posed by the evaluation of systems which are designed to work in critical interaction with a human expert user and to work over vast amounts of data. These systems pose problems for both paradigms although for different reasons. The primary aim of this paper is to provoke discussion and the search for solutions. We have no proven solutions at present. However, we describe a programme of exploratory research on which we have already embarked, which involves ground clearing work which we expect to result in a deep understanding of the systems and users, a pre-requisite for developing a general framework for evaluation in this field.
2003
pdf
bib
abs
FEMTI: creating and using a framework for MT evaluation
Margaret King
|
Andrei Popescu-Belis
|
Eduard Hovy
Proceedings of Machine Translation Summit IX: Papers
This paper presents FEMTI, a web-based Framework for the Evaluation of Machine Translation in ISLE. FEMTI offers structured descriptions of potential user needs, linked to an overview of technical characteristics of MT systems. The description of possible systems is mainly articulated around the quality characteristics for software product set out in ISO/IEC standard 9126. Following the philosophy set out there and in the related 14598 series of standards, each quality characteristic bottoms out in metrics which may be applied to a particular instance of a system in order to judge how satisfactory the system is with respect to that characteristic. An evaluator can use the description of user needs to help identify the specific needs of his evaluation and the relations between them. He can then follow the pointers to system description to determine what metrics should be applied and how. In the current state of the framework, emphasis is on being exhaustive, including as much as possible of the information available in the literature on machine translation evaluation. Future work will aim at being more analytic, looking at characteristics and metrics to see how they relate to one another, validating metrics and investigating the correlation between particular metrics and human judgement.
2002
pdf
bib
Computer-Aided Specification of Quality Models for Machine Translation Evaluation
Eduard Hovy
|
Margaret King
|
Andrei Popescu-Belis
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
2000
pdf
bib
Methods and Metrics for the Evaluation of Dictation Systems: a Case Study
Maria Canelli
|
Daniele Grasso
|
Margaret King
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)
1999
pdf
bib
abs
MT evaluation
Margaret King
|
Eduard Hovy
|
Benjamin K. Tsou
|
John White
|
Yusoff Zaharin
Proceedings of Machine Translation Summit VII
This panel deals with the general topic of evaluation of machine translation systems. The first contribution sets out some recent work on creating standards for the design of evaluations. The second, by Eduard Hovy. takes up the particular issue of how metrics can be differentiated and systematized. Benjamin K. T'sou suggests that whilst men may evaluate machines, machines may also evaluate men. John S. White focuses on the question of the role of the user in evaluation design, and Yusoff Zaharin points out that circumstances and settings may have a major influence on evaluation design.
pdf
bib
abs
TransRouter : a decision support tool for translation managers
Margaret King
Proceedings of Machine Translation Summit VII
Translation managers often have to decide on the most appropriate way to deal with a translation project. Possible options may include human translation, translation using a specific terminology resource, translation in interaction with a translation memory system, and machine translation. The decision making involved is complex, and it is not always easy to decide by inspection whether a specific text lends itself to certain kinds of treatment. TransRouter supports the decision making by offering a suite of computer based tools which can be used to analyse the text to be translated. Some tools, such as the word counter, the repetition detector, the sentence length estimator and the sentence simplicity checker look at characteristics of the text itself. A version comparison tool compares the new text to previously translated texts. Other tools, such as the unknown terms detector and the translation memory coverage estimator, estimate overlap between the text and a set of known resources. The information gained, combined with further information provided by the user, is input to a decision kernel which calculates possible routes towards achieving the translation together with their cost and consequences on translation quality. The user may influence the kernel by, for example, specifying particular resources or refining routes under investigation. The final decision on how to treat the project rests with the translation manager.
1994
bib
Evaluating translation
Margaret King
Machine Translation and Translation Theory
1993
pdf
bib
Panel on Evaluation: MT Summit IV Introduction
Margaret King
Proceedings of Machine Translation Summit IV
pdf
bib
Evaluation of machine translation software and methods
Margaret King
Proceedings of Translating and the Computer 15
1991
pdf
bib
Evaluation of MT Systems
Margaret King
|
Yorick Wilks
|
Sture Allen
|
Ulrich Heid
|
Doris Albisser
Proceedings of Machine Translation Summit III: Panels
1990
pdf
bib
Using Test Suites in Evaluation of Machine Translation Systems
Margaret King
|
Kirsten Falkedal
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics
1989
pdf
bib
New directions in MT systems: a change in paradigm
Margaret King
Proceedings of Machine Translation Summit II
1986
pdf
bib
Machine Translation already does Work
Margaret King
24th Annual Meeting of the Association for Computational Linguistics
1984
pdf
bib
When Is the Next ALPAC Report Due?
Margaret King
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics
1981
pdf
bib
Eurotra: an attempt to achieve multilingual MT
Margaret King
Translating and the Computer: Practical experience of machine translation