Christy Doran


2024

pdf bib
It’s Not under the Lamppost: Expanding the Reach of Conversational AI
Christy Doran | Deborah A. Dahl
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Generic commercial language-based assistants have become ubiquitously available, originally in the form of smart speakers and mobile apps, and more recently in the form of systems based on generative AI. At first glance, their capabilities seem remarkable. Speech recognition works well, NLU mostly works, and access to back-end information sources is usually quite good. However, there is still a lot of work to be done. In the area of NLU in particular, focused probes into the capabilities of language-based assistants easily reveal significant areas of brittleness that demonstrate large gaps in their coverage. For example, the straightforward disjunctive query is this monday or tuesday elicited the nonsensical response it’s 2:50 p.m. many consider it to be the afternoon. These gaps are difficult to identify if the development process relies on training the system with an ongoing supply of natural user data, because this natural data can become distorted by a self-reinforcing feedback loop where the system ‘trains’ the user to produce data that works. This paper describes a process for collecting specific kinds of data to uncover these gaps and an annotation scheme for system responses, and includes examples of simple utterances that nonetheless fail to be correctly processed. The systems tested include both Conventional assistants, such as Amazon Alexa and Google Assistant, as well as GenAI systems, including ChatGPT and Bard/Gemini. We claim that these failures are due to a lack of attention to the full spectrum of input possibilities, and argue that systems would benefit from the inclusion of focused manual assessment to directly target likely gaps.

2019

pdf bib
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Jill Burstein | Christy Doran | Thamar Solorio
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

2010

pdf bib
Evaluation of Machine Translation Errors in English and Iraqi Arabic
Sherri Condon | Dan Parvaz | John Aberdeen | Christy Doran | Andrew Freeman | Marwan Awad
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Errors in machine translations of English-Iraqi Arabic dialogues were analyzed at two different points in the systems? development using HTER methods to identify errors and human annotations to refine TER annotations. The analyses were performed on approximately 100 translations into each language from 4 translation systems collected at two annual evaluations. Although the frequencies of errors in the more mature systems were lower, the proportions of error types exhibited little change. Results include high frequencies of pronoun errors in translations to English, high frequencies of subject person inflection in translations to Iraqi Arabic, similar frequencies of word order errors in both translation directions, and very low frequencies of polarity errors. The problems with many errors can be generalized as the need to insert lexemes not present in the source or vice versa, which includes errors in multi-word expressions. Discourse context will be required to resolve some problems with deictic elements like pronouns.

2009

pdf bib
Normalization for Automated Metrics: English and Arabic Speech Translation
Sherri Condon | Gregory A. Sanders | Dan Parvaz | Alan Rubenstein | Christy Doran | John Aberdeen | Beatrice Oshika
Proceedings of Machine Translation Summit XII: Papers

2008

pdf bib
Applying Automated Metrics to Speech Translation Dialogs
Sherri Condon | Jon Phillips | Christy Doran | John Aberdeen | Dan Parvaz | Beatrice Oshika | Greg Sanders | Craig Schlenoff
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Over the past five years, the Defense Advanced Research Projects Agency (DARPA) has funded development of speech translation systems for tactical applications. A key component of the research program has been extensive system evaluation, with dual objectives of assessing progress overall and comparing among systems. This paper describes the methods used to obtain BLEU, TER, and METEOR scores for two-way English-Iraqi Arabic systems. We compare the scores with measures based on human judgments and demonstrate the effects of normalization operations on BLEU scores. Issues that are highlighted include the quality of test data and differential results of applying automated metrics to Arabic vs. English.

2003

pdf bib
Dialogue complexity with portability? Research directions for the Information State approach
Carl Burke | Christy Doran | Abigail Gertner | Andy Gregorowicz | Lisa Harper | Joel Korb | Dan Loehr
Proceedings of the HLT-NAACL 2003 Workshop on Research Directions in Dialogue Processing

2000

pdf bib
Reinterpretation of an Existing NLG System in a Generic Generation Architecture
Lynne Cahill | Christy Doran | Roger Evans | Chris Mellish | Daniel Paiva | Mike Reape | Donia Scott | Neil Tipper
INLG’2000 Proceedings of the First International Conference on Natural Language Generation

pdf bib
Enabling Resource Sharing in Language Generation: an Abstract Reference Architecture
Lynne Cahill | Christy Doran | Roger Evans | Rodger Kibble | Chris Mellish | D. Paiva | Mike Reape | Donia Scott | Neil Tipper
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1994

pdf bib
XTAG System - A Wide Coverage Grammar for English
Christy Doran | Dania Egedi | Beth Ann Hockey | B. Srinivas | Martin Zaidel
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics