Bonnie Dorr

Also published as: Bonnie J. Dorr

2025

pdf bib abs
Making Task-Oriented Dialogue Datasets More Natural by Synthetically Generating Indirect User Requests
Amogh Mannekote | Jinseok Nam | Ziming Li | Kristy Elizabeth Boyer | Bonnie J. Dorr
Proceedings of the 31st International Conference on Computational Linguistics

Indirect User Requests (IURs), such as “It’s cold in here” instead of “Could you please increase the temperature?” are common in human-human task-oriented dialogue and require world knowledge and pragmatic reasoning from the listener. While large language models (LLMs) can handle these requests effectively, smaller models deployed on virtual assistants often struggle due to resource constraints. Moreover, existing task-oriented dialogue benchmarks lack sufficient examples of complex discourse phenomena such as indirectness. To address this, we propose a set of linguistic criteria along with an LLM-based pipeline for generating realistic IURs to test natural language understanding (NLU) and dialogue state tracking (DST) models before deployment in a new domain. We also release IndirectRequests, a dataset of IURs based on the Schema-Guided Dialogue (SGD) corpus, as a comparative testbed for evaluating the performance of smaller models in handling indirect requests.

pdf bib abs
From Disagreement to Understanding: The Case for Ambiguity Detection in NLI
Chathuri Jayaweera | Bonnie J. Dorr
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP

This position paper argues that annotation disagreement in Natural Language Inference (NLI) is not mere noise but often reflects meaningful variation, especially when triggered by ambiguity in the premise or hypothesis. While underspecified guidelines and annotator behavior contribute to variation, content-based ambiguity provides a process-independent signal of divergent human perspectives. We call for a shift toward ambiguity-aware NLI that first identifies ambiguous input pairs, classifies their types, and only then proceeds to inference. To support this shift, we present a framework that incorporates ambiguity detection and classification prior to inference. We also introduce a unified taxonomy that synthesizes existing taxonomies, illustrates key subtypes with examples, and motivates targeted detection methods that better align models with human interpretation. Although current resources lack datasets explicitly annotated for ambiguity and subtypes, this gap presents an opportunity: by developing new annotated resources and exploring unsupervised approaches to ambiguity detection, we enable more robust, explainable, and human-aligned NLI systems.

2024

pdf bib abs
A Comparison of Fine-Tuning and In-Context Learning for Clause-Level Morphosyntactic Alternation
Jim Su | Justin Ho | George Broadwell | Sarah Moeller | Bonnie Dorr
Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024)

This paper presents our submission to the AmericasNLP 2024 Shared Task on the Creation of Educational Materials for Indigenous Languages. We frame this task as one of morphological inflection generation, treating each sentence as a single word. We investigate and compare two distinct approaches: fine-tuning neural encoder-decoder models such as NLLB- 200, and in-context learning with proprietary large language models (LLMs). Our findings demonstrate that for this task, no one approach is perfect. Anthropic’s Claude 3 Opus, when supplied with grammatical description entries, achieves the highest performance on Bribri among the evaluated models. This outcome corroborates and extends previous research exploring the efficacy of in-context learning in low- resource settings. For Maya, fine-tuning NLLB- 200-3.3B using StemCorrupt augmented data yielded the best performance.

pdf bib abs
The Effect of Data Partitioning Strategy on Model Generalizability: A Case Study of Morphological Segmentation
Zoey Liu | Bonnie Dorr
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Recent work to enhance data partitioning strategies for more realistic model evaluation face challenges in providing a clear optimal choice. This study addresses these challenges, focusing on morphological segmentation and synthesizing limitations related to language diversity, adoption of multiple datasets and splits, and detailed model comparisons. Our study leverages data from 19 languages, including ten indigenous or endangered languages across 10 language families with diverse morphological systems (polysynthetic, fusional, and agglutinative) and different degrees of data availability. We conduct large-scale experimentation with varying sized combinations of training and evaluation sets as well as new test data. Our results show that, when faced with new test data: (1) models trained from random splits are able to achieve higher numerical scores; (2) model rankings derived from random splits tend to generalize more consistently.

pdf bib abs
Balancing Transparency and Accuracy: A Comparative Analysis of Rule-Based and Deep Learning Models in Political Bias Classification
Manuel Nunez Martinez | Sonja Schmer-Galunder | Zoey Liu | Sangpil Youm | Chathuri Jayaweera | Bonnie J. Dorr
Proceedings of the Second Workshop on Social Influence in Conversations (SICon 2024)

The unchecked spread of digital information, combined with increasing political polarization and the tendency of individuals to isolate themselves from opposing political viewpoints opposing views, has driven researchers to develop systems for automatically detecting political bias in media. This trend has been further fueled by discussions on social media. We explore methods for categorizing bias in US news articles, comparing rule-based and deep learning approaches. The study highlights the sensitivity of modern self-learning systems to unconstrained data ingestion, while reconsidering the strengths of traditional rule-based systems. Applying both models to left-leaning (CNN) and right-leaning (FOX) News articles, we assess their effectiveness on data beyond the original training and test sets. This analysis highlights each model’s accuracy, offers a framework for exploring deep-learning explainability, and sheds light on political bias in US news media. We contrast the opaque architecture of a deep learning model with the transparency of a linguistically informed rule-based model, showing that the rule-based model performs consistently across different data conditions and offers greater transparency, whereas the deep learning model is dependent on the training set and struggles with unseen data.

Successful social influence, whether at individual or community levels, requires expertise and care in several dimensions of communication: understanding of emotions, beliefs, and values; transparency; and context-aware behavior shaping. Based on our experience in identifying mediation needs in social media and engaging with moderators and users, we developed a set of principles that we believe social influence systems should adhere to to ensure ethical operation, effectiveness, widespread adoption, and trust by users on both sides of the engagement of influence. We demonstrate these principles in D-ESC: Dialogue Assistant for Engaging in Social-Cybermediation, in the context of AI-assisted social media mediation, a newer paradigm of automatic moderation that responds to unique and changing communities while engendering and maintaining trust in users, moderators, and platform-holders. Through this case study, we identify opportunities for our principles to guide future systems towards greater opportunities for positive social change.

2023

pdf bib abs
Summary Cycles: Exploring the Impact of Prompt Engineering on Large Language Models’ Interaction with Interaction Log Information
Jeremy Block | Yu-Peng Chen | Abhilash Budharapu | Lisa Anthony | Bonnie Dorr
Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems

With the aim of improving work efficiency, we examine how Large Language Models (LLMs) can better support the handoff of information by summarizing user interactions in collaborative intelligence analysis communication. We experiment with interaction logs, or a record of user interactions with a system. Inspired by chain-of-thought prompting, we describe a technique to avoid API token limits with recursive summarization requests. We then apply ChatGPT over multiple iterations to extract named entities, topics, and summaries, combined with interaction sequence sentences, to generate summaries of critical events and results of analysis sessions. We quantitatively evaluate the generated summaries against human-generated ones using common accuracy metrics (e.g., ROUGE-L, BLEU, BLEURT, and TER). We also report qualitative trends and the factuality of the output. We find that manipulating the audience feature or providing single-shot examples minimally influences the model’s accuracy. While our methodology successfully summarizes interaction logs, the lack of significant results raises questions about prompt engineering and summarization effectiveness generally. We call on explainable artificial intelligence research to better understand how terms and their placement may change LLM outputs, striving for more consistent prompt engineering guidelines.

pdf bib abs
Detoxifying Online Discourse: A Guided Response Generation Approach for Reducing Toxicity in User-Generated Text
Ritwik Bose | Ian Perera | Bonnie Dorr
Proceedings of the First Workshop on Social Influence in Conversations (SICon 2023)

The expression of opinions, stances, and moral foundations on social media often coincide with toxic, divisive, or inflammatory language that can make constructive discourse across communities difficult. Natural language generation methods could provide a means to reframe or reword such expressions in a way that fosters more civil discourse, yet current Large Language Model (LLM) methods tend towards language that is too generic or formal to seem authentic for social media discussions. We present preliminary work on training LLMs to maintain authenticity while presenting a community’s ideas and values in a constructive, non-toxic manner.

2022

pdf bib abs
From Stance to Concern: Adaptation of Propositional Analysis to New Tasks and Domains
Brodie Mather | Bonnie Dorr | Adam Dalton | William de Beaumont | Owen Rambow | Sonja Schmer-Galunder
Findings of the Association for Computational Linguistics: ACL 2022

We present a generalized paradigm for adaptation of propositional analysis (predicate-argument pairs) to new tasks and domains. We leverage an analogy between stances (belief-driven sentiment) and concerns (topical issues with moral dimensions/endorsements) to produce an explanatory representation. A key contribution is the combination of semi-automatic resource building for extraction of domain-dependent concern types (with 2-4 hours of human labor per domain) and an entirely automatic procedure for extraction of domain-independent moral dimensions and endorsement values. Prudent (automatic) selection of terms from propositional structures for lexical expansion (via semantic similarity) produces new moral dimension lexicons at three levels of granularity beyond a strong baseline lexicon. We develop a ground truth (GT) based on expert annotators and compare our concern detection output to GT, to yield 231% improvement in recall over baseline, with only a 10% loss in precision. F1 yields 66% improvement over baseline and 97.8% of human performance. Our lexically based approach yields large savings over approaches that employ costly human labor and model building. We provide to the community a newly expanded moral dimension/value lexicon, annotation guidelines, and GT.

We present the BeSt corpus, which records cognitive state: who believes what (i.e., factuality), and who has what sentiment towards what. This corpus is inspired by similar source-and-target corpora, specifically MPQA and FactBank. The corpus comprises two genres, newswire and discussion forums, in three languages, Chinese (Mandarin), English, and Spanish. The corpus is distributed through the LDC.

2020

Achieving true human-like ability to conduct a conversation remains an elusive goal for open-ended dialogue systems. We posit this is because extant approaches towards natural language generation (NLG) are typically construed as end-to-end architectures that do not adequately model human generation processes. To investigate, we decouple generation into two separate phases: planning and realization. In the planning phase, we train two planners to generate plans for response utterances. The realization phase uses response plans to produce an appropriate response. Through rigorous evaluations, both automated and human, we demonstrate that decoupling the process into planning and realization performs better than an end-to-end approach.

We describe a system that supports natural language processing (NLP) components for active defenses against social engineering attacks. We deploy a pipeline of human language technology, including Ask and Framing Detection, Named Entity Recognition, Dialogue Engineering, and Stylometry. The system processes modern message formats through a plug-in architecture to accommodate innovative approaches for message analysis, knowledge representation and dialogue generation. The novelty of the system is that it uses NLP for cyber defense and engages the attacker using bots to elicit evidence to attribute to the attacker and to waste the attacker’s time and resources.

pdf bib abs
Adaptation of a Lexical Organization for Social Engineering Detection and Response Generation
Archna Bhatia | Adam Dalton | Brodie Mather | Sashank Santhanam | Samira Shaikh | Alan Zemel | Tomek Strzalkowski | Bonnie J. Dorr
Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management

We present a paradigm for extensible lexicon development based on Lexical Conceptual Structure to support social engineering detection and response generation. We leverage the central notions of ask (elicitation of behaviors such as providing access to money) and framing (risk/reward implied by the ask). We demonstrate improvements in ask/framing detection through refinements to our lexical organization and show that response generation qualitatively improves as ask/framing detection performance improves. The paradigm presents a systematic and efficient approach to resource adaptation for improved task-specific performance.

2019

pdf bib abs
Bilingual Low-Resource Neural Machine Translation with Round-Tripping: The Case of Persian-Spanish
Benyamin Ahmadnia | Bonnie Dorr
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The quality of Neural Machine Translation (NMT), as a data-driven approach, massively depends on quantity, quality, and relevance of the training dataset. Such approaches have achieved promising results for bilingually high-resource scenarios but are inadequate for low-resource conditions. This paper describes a round-trip training approach to bilingual low-resource NMT that takes advantage of monolingual datasets to address training data scarcity, thus augmenting translation quality. We conduct detailed experiments on Persian-Spanish as a bilingually low-resource scenario. Experimental results demonstrate that this competitive approach outperforms the baselines.

pdf bib abs
Enhancing Phrase-Based Statistical Machine Translation by Learning Phrase Representations Using Long Short-Term Memory Network
Benyamin Ahmadnia | Bonnie Dorr
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Phrases play a key role in Machine Translation (MT). In this paper, we apply a Long Short-Term Memory (LSTM) model over conventional Phrase-Based Statistical MT (PBSMT). The core idea is to use an LSTM encoder-decoder to score the phrase table generated by the PBSMT decoder. Given a source sequence, the encoder and decoder are jointly trained in order to maximize the conditional probability of a target sequence. Analytically, the performance of a PBSMT system is enhanced by using the conditional probabilities of phrase pairs computed by an LSTM encoder-decoder as an additional feature in the existing log-linear model. We compare the performance of the phrase tables in the PBSMT to the performance of the proposed LSTM and observe its positive impact on translation quality. We construct a PBSMT model using the Moses decoder and enrich the Language Model (LM) utilizing an external dataset. We then rank the phrase tables using an LSTM-based encoder-decoder. This method produces a gain of up to 3.14 BLEU score on the test set.

2018

pdf bib abs
Lexical Conceptual Structure of Literal and Metaphorical Spatial Language: A Case Study of “Push”
Bonnie Dorr | Mari Olsen
Proceedings of the First International Workshop on Spatial Language Understanding

Prior methodologies for understanding spatial language have treated literal expressions such as “Mary pushed the car over the edge” differently from metaphorical extensions such as “Mary’s job pushed her over the edge”. We demonstrate a methodology for standardizing literal and metaphorical meanings, by building on work in Lexical Conceptual Structure (LCS), a general-purpose representational component used in machine translation. We argue that spatial predicates naturally extend into other fields (e.g., circumstantial or temporal), and that LCS provides both a framework for distinguishing spatial from non-spatial, and a system for finding metaphorical meaning extensions. We start with MetaNet (MN), a large repository of conceptual metaphors, condensing 197 spatial entries into sixteen top-level categories of motion frames. Using naturally occurring instances of English push , and expansions of MN frames, we demonstrate that literal and metaphorical extensions exhibit patterns predicted and represented by the LCS model.

pdf bib abs
The Case for Systematically Derived Spatial Language Usage
Bonnie Dorr | Clare Voss
Proceedings of the First International Workshop on Spatial Language Understanding

This position paper argues that, while prior work in spatial language understanding for tasks such as robot navigation focuses on mapping natural language into deep conceptual or non-linguistic representations, it is possible to systematically derive regular patterns of spatial language usage from existing lexical-semantic resources. Furthermore, even with access to such resources, effective solutions to many application areas such as robot navigation and narrative generation also require additional knowledge at the syntax-semantics interface to cover the wide range of spatial expressions observed and available to natural language speakers. We ground our insights in, and present our extensions to, an existing lexico-semantic resource, covering 500 semantic classes of verbs, of which 219 fall within a spatial subset. We demonstrate that these extensions enable systematic derivation of regular patterns of spatial language without requiring manual annotation.

pdf bib abs
STYLUS: A Resource for Systematically Derived Language Usage
Bonnie Dorr | Clare Voss
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

We describe a resource derived through extraction of a set of argument realizations from an existing lexical-conceptual structure (LCS) Verb Database of 500 verb classes (containing a total of 9525 verb entries) to include information about realization of arguments for a range of different verb classes. We demonstrate that our extended resource, called STYLUS (SysTematicallY Derived Language USe), enables systematic derivation of regular patterns of language usage without requiring manual annotation. We posit that both spatially oriented applications such as robot navigation and more general applications such as narrative generation require a layered representation scheme where a set of primitives (often grounded in space/motion such as GO) is coupled with a representation of constraints at the syntax-semantics interface. We demonstrate that the resulting resource covers three cases of lexico-semantic operations applicable to both language understanding and language generation.

2017

pdf bib abs
Characterization of Divergence in Impaired Speech of ALS Patients
Archna Bhatia | Bonnie Dorr | Kristy Hollingshead | Samuel L. Phillips | Barbara McKenzie
Proceedings of the 16th BioNLP Workshop

Approximately 80% to 95% of patients with Amyotrophic Lateral Sclerosis (ALS) eventually develop speech impairments, such as defective articulation, slow laborious speech and hypernasality. The relationship between impaired speech and asymptomatic speech may be seen as a divergence from a baseline. This relationship can be characterized in terms of measurable combinations of phonological characteristics that are indicative of the degree to which the two diverge. We demonstrate that divergence measurements based on phonological characteristics of speech correlate with physiological assessments of ALS. Speech-based assessments offer benefits over commonly-used physiological assessments in that they are inexpensive, non-intrusive, and do not require trained clinical personnel for administering and interpreting the results.

2013

pdf bib
Computing Lexical Contrast
Saif M. Mohammad | Bonnie J. Dorr | Graeme Hirst | Peter D. Turney
Computational Linguistics, Volume 39, Issue 3 - September 2013

2012

pdf bib
Language Research at DARPA-Machine Translation and Beyond
Bonnie J. Dorr
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Keynote Presentations

2010

We describe a unified and coherent syntactic framework for supporting a semantically-informed syntactic approach to statistical machine translation. Semantically enriched syntactic tags assigned to the target-language training texts improved translation quality. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet reported on the NIST 2009 Urdu-English translation task. This finding supports the hypothesis (posed by many researchers in the MT community, e.g., in DARPA GALE) that both syntactic and semantic information are critical for improving translation quality—and further demonstrates that large gains can be achieved for low-resource languages with different word order than English.

pdf bib
Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods
Nitin Madnani | Bonnie J. Dorr
Computational Linguistics, Volume 36, Issue 3 - September 2010

pdf bib abs
A Modality Lexicon and its use in Automatic Tagging
Kathryn Baker | Michael Bloodgood | Bonnie Dorr | Nathaniel W. Filardo | Lori Levin | Christine Piatko
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes our resource-building results for an eight-week JHU Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation. Specifically, we describe the construction of a modality annotation scheme, a modality lexicon, and two automated modality taggers that were built using the lexicon and annotation scheme. Our annotation scheme is based on identifying three components of modality: a trigger, a target and a holder. We describe how our modality lexicon was produced semi-automatically, expanding from an initial hand-selected list of modality trigger words and phrases. The resulting expanded modality lexicon is being made publicly available. We demonstrate that one tagger―a structure-based tagger―results in precision around 86% (depending on genre) for tagging of a standard LDC data set. In a machine translation application, using the structure-based tagger to annotate English modalities on an English-Urdu training corpus improved the translation quality score for Urdu by 0.3 Bleu points in the face of sparse training data.

pdf bib
Putting the User in the Loop: Interactive Maximal Marginal Relevance for Query-Focused Summarization
Jimmy Lin | Nitin Madnani | Bonnie Dorr
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus
Saif Mohammad | Cody Dunne | Bonnie Dorr
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Using Citations to Generate surveys of Scientific Paradigms
Saif Mohammad | Bonnie Dorr | Melissa Egan | Ahmed Hassan | Pradeep Muthukrishan | Vahed Qazvinian | Dragomir Radev | David Zajic
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric
Matthew Snover | Nitin Madnani | Bonnie Dorr | Richard Schwartz
Proceedings of the Fourth Workshop on Statistical Machine Translation

2008

pdf bib abs
Are Multiple Reference Translations Necessary? Investigating the Value of Paraphrased Reference Translations in Parameter Optimization
Nitin Madnani | Philip Resnik | Bonnie J. Dorr | Richard Schwartz
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

Most state-of-the-art statistical machine translation systems use log-linear models, which are defined in terms of hypothesis features and weights for those features. It is standard to tune the feature weights in order to maximize a translation quality metric, using held-out test sentences and their corresponding reference translations. However, obtaining reference translations is expensive. In our earlier work (Madnani et al., 2007), we introduced a new full-sentence paraphrase technique, based on English-to-English decoding with an MT system, and demonstrated that the resulting paraphrases can be used to cut the number of human reference translations needed in half. In this paper, we take the idea a step further, asking how far it is possible to get with just a single good reference translation for each item in the development set. Our analysis suggests that it is necessary to invest in four or more human translations in order to significantly improve on a single translation augmented by monolingual paraphrases.

pdf bib
Language and Translation Model Adaptation using Comparable Corpora
Matthew Snover | Bonnie Dorr | Richard Schwartz
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Computing Word-Pair Antonymy
Saif Mohammad | Bonnie Dorr | Graeme Hirst
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.

pdf bib
Combining Open-Source with Research to Re-engineer a Hands-on Introductory NLP Course
Nitin Madnani | Bonnie J. Dorr
Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics

2007

pdf bib
Combining Outputs from Multiple Machine Translation Systems
Antti-Veikko Rosti | Necip Fazil Ayan | Bing Xiang | Spyros Matsoukas | Richard Schwartz | Bonnie Dorr
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Using Paraphrases for Parameter Tuning in Statistical Machine Translation
Nitin Madnani | Necip Fazil Ayan | Philip Resnik | Bonnie Dorr
Proceedings of the Second Workshop on Statistical Machine Translation

2006

pdf bib abs
Challenges in Building an Arabic-English GHMT System with SMT Components
Nizar Habash | Bonnie Dorr | Christof Monz
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

The research context of this paper is developing hybrid machine translation (MT) systems that exploit the advantages of linguistic rule-based and statistical MT systems. Arabic, as a morphologically rich language, is especially challenging even without addressing the hybridization question. In this paper, we describe the challenges in building an Arabic-English generation-heavy machine translation (GHMT) system and boosting it with statistical machine translation (SMT) components. We present an extensive evaluation of multiple system variants and report positive results on the advantages of hybridization.

pdf bib abs
A Study of Translation Edit Rate with Targeted Human Annotation
Matthew Snover | Bonnie Dorr | Rich Schwartz | Linnea Micciulla | John Makhoul
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

We examine a new, intuitive measure for evaluating machine-translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments. Translation Edit Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation. We show that the single-reference variant of TER correlates as well with human judgments of MT quality as the four-reference variant of BLEU. We also define a human-targeted TER (or HTER) and show that it yields higher correlations with human judgments than BLEU—even when BLEU is given human-targeted references. Our results indicate that HTER correlates with human judgments better than HMETEOR and that the four-reference variants of TER and HTER correlate with human judgments as well as—or better than—a second human judgment does.

pdf bib
Leveraging Recurrent Phrase Structure in Large-scale Ontology Translation
G. Craig Murray | Bonnie J. Dorr | Jimmy Lin | Jan Hajič | Pavel Pecina
Proceedings of the 11th Annual Conference of the European Association for Machine Translation

While both spoken and written language processing stand to benefit from parsing, the standard Parseval metrics (Black et al., 1991) and their canonical implementation (Sekine and Collins, 1997) are only useful for text. The Parseval metrics are undefined when the words input to the parser do not match the words in the gold standard parse tree exactly, and word errors are unavoidable with automatic speech recognition (ASR) systems. To fill this gap, we have developed a publicly available tool for scoring parses that implements a variety of metrics which can handle mismatches in words and segmentations, including: alignment-based bracket evaluation, alignment-based dependency evaluation, and a dependency evaluation that does not require alignment. We describe the different metrics, how to use the tool, and the outcome of an extensive set of experiments on the sensitivity.

This paper describes an effort to investigate the incrementally deepening development of an interlingua notation, validated by human annotation of texts in English plus six languages. We begin with deep syntactic annotation, and in this paper present a series of annotation manuals for six different languages at the deep-syntactic level of representation. Many syntactic differences between languages are removed in the proposed syntactic annotation, making them useful resources for multilingual NLP projects with semantic components.

pdf bib
A Maximum Entropy Approach to Combining Word Alignments
Necip Fazil Ayan | Bonnie J. Dorr
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Going Beyond AER: An Extensive Analysis of Word Alignments and Their Impact on MT
Necip Fazil Ayan | Bonnie J. Dorr
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Leveraging Reusability: Cost-Effective Lexical Acquisition for Large-Scale Ontology Translation
G. Craig Murray | Bonnie J. Dorr | Jimmy Lin | Jan Hajič | Pavel Pecina
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib
NeurAlign: Combining Word Alignments Using Neural Networks
Necip Fazil Ayan | Bonnie J. Dorr | Christof Monz
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Alignment Link Projection Using Transformation-Based Learning
Necip Fazil Ayan | Bonnie J. Dorr | Christof Monz
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
A Methodology for Extrinsic Evaluation of Text Summarization: Does ROUGE Correlate?
Bonnie Dorr | Christof Monz | Stacy President | Richard Schwartz | David Zajic
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization

pdf bib
Frame Semantic Enhancement of Lexical-Semantic Resources
Rebecca Green | Bonnie Dorr
Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition

2004

pdf bib abs
Multi-Align: combining linguistic and statistical techniques to improve alignments for adaptable MT
Necip Fazil Ayan | Bonnie Dorr | Nizar Habash
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers

An adaptable statistical or hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches provide a reasonable initial estimate for word alignment. However, they cannot handle certain types of linguistic phenomena such as long-distance dependencies and structural differences between languages. We address this issue in Multi-Align, a new framework for incremental testing of different alignment algorithms and their combinations. Our design allows users to tune their systems to the properties of a particular genre/domain while still benefiting from general linguistic knowledge associated with a language pair. We demonstrate that a combination of statistical and linguistically-informed alignments can resolve translation divergences during the alignment process.

MT systems that use only superficial representations, including the current generation of statistical MT systems, have been successful and useful. However, they will experience a plateau in quality, much like other “silver bullet” approaches to MT. We pursue work on the development of interlingual representations for use in symbolic or hybrid MT systems. In this paper, we describe the creation of an interlingua and the development of a corpus of semantically annotated text, to be validated in six languages and evaluated in several ways. We have established a distributed, well-functioning research methodology, designed a preliminary interlingua notation, created annotation manuals and tools, developed a test collection in six languages with associated English translations, annotated some 150 translations, and designed and applied various annotation metrics. We describe the data sets being annotated and the interlingual (IL) representation language which uses two ontologies and a systematic theta-role list. We present the annotation tools built and outline the annotation process. Following this, we describe our evaluation methodology and conclude with a summary of issues that have arisen.

pdf bib
Identification of Confusable Drug Names: A New Approach and Evaluation Methodology
Grzegorz Kondrak | Bonnie Dorr
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
A Lexically-Driven Algorithm for Disfluency Detection
Matthew Snover | Bonnie Dorr | Richard Schwartz
Proceedings of HLT-NAACL 2004: Short Papers

pdf bib
Inducing Frame Semantic Verb Classes from WordNet and LDOCE
Rebecca Green | Bonnie J. Dorr | Philip Resnik
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Inducing a semantic frame lexicon from WordNet data
Rebecca Green | Bonnie Dorr
Proceedings of the 2nd Workshop on Text Meaning and Interpretation

2003

pdf bib abs
Acquisition of bilingual MT lexicons from OCRed dictionaries
Burcu Karagol-Ayan | David Doermann | Bonnie J. Dorr
Proceedings of Machine Translation Summit IX: Papers

This paper describes an approach to analyzing the lexical structure of OCRed bilingual dictionaries to construct resources suited for machine translation of low-density languages, where online resources are limited. A rule-based, an HMM-based, and a post-processed HMM-based method are used for rapid construction of MT lexicons based on systematic structural clues provided in the original dictionary. We evaluate the effectiveness of our techniques, concluding that: (1) the rule-based method performs better with dictionaries where the font is not an important distinguishing feature for determining information types; (2) the post-processed stochastic method improves the results of the stochastic method for phrasal entries; and (3) Our resulting bilingual lexicons are comprehensive enough to provide the basis for reasonable translation results when compared to human translations.

pdf bib abs
CatVar: a database of categorial variations for English
Nizar Habash | Bonnie Dorr
Proceedings of Machine Translation Summit IX: System Presentations

We present a new large-scale database called “CatVar” (Habash and Dorr, 2003) which contains categorial variations of English lexemes. Due to the prevalence of cross-language categorial variation in multilingual applications, our categorial-variation resource may serve as an integral part of a diverse range of natural language applications. Thus, the research reported herein overlaps heavily with that of the machine-translation, lexicon-construction, and information-retrieval communities. We demonstrate this database, embedded in a graphical interface; we also show a GUI for user input of corrections to the database.

pdf bib
Evaluation techniques applied to domain tuning of MT lexicons
Necip Fazil Ayan | Bonnie J. Dorr | Okan Kolak
Workshop on Systemizing MT Evaluation

pdf bib
A Categorial Variation Database for English
Nizar Habash | Bonnie Dorr
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation
Bonnie Dorr | David Zajic | Richard Schwartz
Proceedings of the HLT-NAACL 03 Text Summarization Workshop

2002

pdf bib abs
DUSTer: a method for unraveling cross-language divergences for statistical word-level alignment
Bonnie Dorr | Lisa Pearl | Rebecca Hwa | Nizar Habash
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

The frequent occurrence of divergenceS—structural differences between languages—presents a great challenge for statistical word-level alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.

pdf bib abs
Handling translation divergences: combining statistical and symbolic techniques in generation-heavy machine translation
Nizar Habash | Bonnie Dorr
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper describes a novel approach to handling translation divergences in a Generation-Heavy Hybrid Machine Translation (GHMT) system. The translation divergence problem is usually reserved for Transfer and Interlingual MT because it requires a large combination of complex lexical and structural mappings. A major requirement of these approaches is the accessibility of large amounts of explicit symmetric knowledge for both source and target languages. This limitation renders Transfer and Interlingual approaches ineffective in the face of structurally-divergent language pairs with asymmetric resources. GHMT addresses the more common form of this problem, source-poor/targetrich, by fully exploiting symbolic and statistical target-language resources. This non-interlingual non-transfer approach is accomplished by using target-language lexical semantics, categorial variations and subcategorization frames to overgenerate multiple lexico-structural variations from a target-glossed syntactic dependency of the source-language sentence. The symbolic overgeneration, which accounts for different possible translation divergences, is constrained by a statistical target-language model.

2001

pdf bib
Large scale language independent generation using thematic hierarchies
Nizar Habash | Bonnie Dorr
Proceedings of Machine Translation Summit VIII

pdf bib
Mapping Lexical Entries in a Verbs Database to WordNet Senses
Rebecca Green | Lisa Pearl | Bonnie J. Dorr | Philip Resnik
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib abs
Building a Chinese-English mapping between verb concepts for multilingual applications
Bonnie J. Dorr | Gina-Anne Levow | Dekang Lin
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper addresses the problem of building conceptual resources for multilingual applications. We describe new techniques for large-scale construction of a Chinese-English lexicon for verbs, using thematic-role information to create links between Chinese and English conceptual information. We then present an approach to compensating for gaps in the existing resources. The resulting lexicon is used for multilingual applications such as machine translation and cross-language information retrieval.

pdf bib
Chinese-English Semantic Resource Construction
Bonnie J. Dorr | Gina-Anne Levow | Dekang Lin | Scott Thomas
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

pdf bib abs
Enhancing automatic acquisition of the thematic structure in a large-scale lexicon for Mandarin Chinese
Mari Broman Olsen | Bonnie J. Dorr | Scott C. Thomas
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper describes a refinement to our procedure for porting lexical conceptual structure (LCS) into new languages. Specifically we describe a two-step process for creating candidate thematic grids for Mandarin Chinese verbs, using the English verb heading the VP in the subde_nitions to separate senses, and roughly parsing the verb complement structure to match thematic structure templates. We accomplished a substantial reduction in manual effort, without substantive loss. The procedure is part of a larger process of creating a usable lexicon for interlingual machine translation from a large on-line resource with both too much and too little information.

pdf bib abs
A thematic hierarchy for efficient generation from lexical-conceptual structure
Bonnie Dorr | Nizar Habash | David Traum
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper describes an implemented algorithm for syntactic realization of a target-language sentence from an interlingual representation called Lexical Conceptual Structure (LCS). We provide a mapping between LCS thematic roles and Abstract Meaning Representation (AMR) relations; these relations serve as input to an off-the-shelf generator (Nitrogen). There are two contributions of this work: (1) the development of a thematic hierarchy that provides ordering information for realization of arguments in their surface positions; (2) the provision of a diagnostic tool for detecting inconsistencies in an existing online LCS-based lexicon that allows us to enhance principles for thematic-role assignment.

pdf bib abs
Lexical selection for cross-language applications: combining LCS with WordNet
Bonnie Dorr | Maria Katsova
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper describes experiments for testing the power of large-scale resources for lexical selection in machine translation (MT) and cross-language information retrieval (CLIR). We adopt the view that verbs with similar argument structure share certain meaning components, but that those meaning components are more relevant to argument realization than to idiosyncratic verb meaning. We verify this by demonstrating that verbs with similar argument structure as encoded in Lexical Conceptual Structure (LCS) are rarely synonymous in WordNet. We then use the results of this work to guide our implementation of an algorithm for cross-language selection of lexical items, exploiting the strengths of each resource: LCS for semantic structure and WordNet for semantic content. We use the Parka Knowledge-Based System to encode LCS representations and WordNet synonym sets and we implement our lexical-selection algorithm as Parka-based queries into a knowledge base containing both information types.

1997

pdf bib abs
Spanish EuroWordNet and LCS-based interlingual MT
Bonnie J. Dorr | M. Antonia Martí | Irene Castellón
AMTA/SIG-IL First Workshop on Interlinguas

We present a machine translation framework in which the interlingua— Lexical Conceptual Structure (LCS)—is coupled with a definitional component that includes bilingual (EuroWordNet) links between words in the source and target languages. While the links between individual words are language-specific, the LCS is designed to be a language-independent, compositional representation. We take the view that the two types of information—shallower, transfer-like knowledge as well as deeper, compositional knowledge—can be reconciled in interlingual machine translation, the former for overcoming the intractability of LCS-based lexical selec- tion, and the latter for relating the underlying semantics of two words cross-linguistically. We describe the acquisition process for these two information types and present results of hand-verification of the acquired lexicon. Finally, we demonstrate the utility of the two information types in interlingual MT.

pdf bib abs
Toward compact monotonically compositional interlingua using lexical aspect
Bonnie J. Dorr | Mari Broman Olsen | Scott C. Thomas
AMTA/SIG-IL First Workshop on Interlinguas

We describe a theoretical investigation into the semantic space described by our interlingua (IL), which currently has 191 main verb classes divided into 434 subclasses, represented by 237 distinct Lexical Conceptual Structures (LCSs). Using the model of aspect in Olsen (1994; 1997)—monotonic aspectual composition—we have identified 71 aspectually basic subclasses that are associated with one or more of 68 aspectually non-basic classes via some lexical (“type-shifting”) rule (Bresnan, 1982; Pinker, 1984; Levin and Rappaport Hovav, 1995). This allows us to refine the IL and address certain computational and theoretical issues at the same time. (1) From a linguistic viewpoint, the expected benefits include a refinement of the aspectual model in (Olsen, 1994; Olsen, 1997) (which provides necessary but not sufficient conditions for aspectual com- position), and a refinement of the verb classifications in (Levin, 1993); we also expect our approach to eventually produce a systematic definition (in terms of LCSs and compositional operations) of the precise meaning components responsible for Levin's classification. (2) Computationally, the lexicon is made more compact.

pdf bib abs
Using WordNet to posit hierarchical structure in Levin’s verb classes
Mari Broman Olsen | Bonnie J. Dorr | David J. Clark
AMTA/SIG-IL First Workshop on Interlinguas

In this paper we report on experiments using WordNet synset tags to evaluate the semantic properties of the verb classes cataloged by Levin (1993). This paper represents ongoing research begun at the University of Pennsylvania (Rosenzweig and Dang, 1997; Palmer, Rosenzweig, and Dang, 1997) and the University of Maryland (Dorr and Jones, 1996b; Dorr and Jones, 1996a; Dorr and Jones, 1996c). Using WordNet sense tags to constrain the intersection of Levin classes, we avoid spurious class intersections introduced by homonymy and polysemy (run a bath, run a mile). By adding class intersections based on a single shared sense-tagged word, we minimize the impact of the non-exhaustiveness of Levin’s database (Dorr and Olsen, 1996; Dorr, To appear). By examining the syntactic properties of the intersective classes, we provide a clearer picture of the relationship between WordNet/EuroWordNet and the LCS interlingua for machine translation and other NLP applications.

pdf bib
Large-Scale Acquisition of LCS-Based Lexicons for Foreign Language Tutoring
Bonnie J. Dorr
Fifth Conference on Applied Natural Language Processing

pdf bib
Deriving Verbal and Compositonal Lexical Aspect for NLP Applications
Bonnie J. Dorr | Mari Broman Olsen
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Structured Lexicons and Semantic Tagging
Bonnie J. Dorr | Mari Broman Olsen
Tagging Text with Lexical Semantics: Why, What, and How?