Douglas W. Oard

Also published as: Doug Oard, Douglas Oard


2022

pdf bib
Constrained Regeneration for Cross-Lingual Query-Focused Extractive Summarization
Elsbeth Turcan | David Wan | Faisal Ladhak | Petra Galuscakova | Sukanta Sen | Svetlana Tchistiakova | Weijia Xu | Marine Carpuat | Kenneth Heafield | Douglas Oard | Kathleen McKeown
Proceedings of the 29th International Conference on Computational Linguistics

Query-focused summaries of foreign-language, retrieved documents can help a user understand whether a document is actually relevant to the query term. A standard approach to this problem is to first translate the source documents and then perform extractive summarization to find relevant snippets. However, in a cross-lingual setting, the query term does not necessarily appear in the translations of relevant documents. In this work, we show that constrained machine translation and constrained post-editing can improve human relevance judgments by including a query term in a summary when its translation appears in the source document. We also present several strategies for selecting only certain documents for regeneration which yield further improvements

2021

pdf bib
Syntopical Graphs for Computational Argumentation Tasks
Joe Barrow | Rajiv Jain | Nedim Lipka | Franck Dernoncourt | Vlad Morariu | Varun Manjunatha | Douglas Oard | Philip Resnik | Henning Wachsmuth
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Approaches to computational argumentation tasks such as stance detection and aspect detection have largely focused on the text of independent claims, losing out on potentially valuable context provided by the rest of the collection. We introduce a general approach to these tasks motivated by syntopical reading, a reading process that emphasizes comparing and contrasting viewpoints in order to improve topic understanding. To capture collection-level context, we introduce the syntopical graph, a data structure for linking claims within a collection. A syntopical graph is a typed multi-graph where nodes represent claims and edges represent different possible pairwise relationships, such as entailment, paraphrase, or support. Experiments applying syntopical graphs to the problems of detecting stance and aspects demonstrate state-of-the-art performance in each domain, significantly outperforming approaches that do not utilize collection-level information.

pdf bib
Cross-language Sentence Selection via Data Augmentation and Rationale Training
Yanda Chen | Chris Kedzie | Suraj Nair | Petra Galuscakova | Rui Zhang | Douglas Oard | Kathleen McKeown
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper proposes an approach to cross-language sentence selection in a low-resource setting. It uses data augmentation and negative sampling techniques on noisy parallel sentence data to directly learn a cross-lingual embedding-based query relevance model. Results show that this approach performs as well as or better than multiple state-of-the-art machine translation + monolingual retrieval systems trained on the same parallel data. Moreover, when a rationale training secondary objective is applied to encourage the model to match word alignment hints from a phrase-based statistical machine translation model, consistent improvements are seen across three language pairs (English-Somali, English-Swahili and English-Tagalog) over a variety of state-of-the-art baselines.

2020

pdf bib
On the Evaluation of Machine Translation n-best Lists
Jacob Bremerman | Huda Khayrallah | Douglas Oard | Matt Post
Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems

The standard machine translation evaluation framework measures the single-best output of machine translation systems. There are, however, many situations where n-best lists are needed, yet there is no established way of evaluating them. This paper establishes a framework for addressing n-best evaluation by outlining three different questions one could consider when determining how one would define a ‘good’ n-best list and proposing evaluation measures for each question. The first and principal contribution is an evaluation measure that characterizes the translation quality of an entire n-best list by asking whether many of the valid translations are placed near the top of the list. The second is a measure that uses gold translations with preference annotations to ask to what degree systems can produce ranked lists in preference order. The third is a measure that rewards partial matches, evaluating the closeness of the many items in an n-best list to a set of many valid references. These three perspectives make clear that having access to many references can be useful when n-best evaluation is the goal.

pdf bib
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)
Kathy McKeown | Douglas W. Oard | Elizabeth | Richard Schwartz
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)

pdf bib
MATERIALizing Cross-Language Information Retrieval: A Snapshot
Petra Galuscakova | Douglas Oard | Joe Barrow | Suraj Nair | Shing Han-Chin | Elena Zotkina | Ramy Eskander | Rui Zhang
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)

At about the midpoint of the IARPA MATERIAL program in October 2019, an evaluation was conducted on systems’ abilities to find Lithuanian documents based on English queries. Subsequently, both the Lithuanian test collection and results from all three teams were made available for detailed analysis. This paper capitalizes on that opportunity to begin to look at what’s working well at this stage of the program, and to identify some promising directions for future work.

pdf bib
A Joint Model for Document Segmentation and Segment Labeling
Joe Barrow | Rajiv Jain | Vlad Morariu | Varun Manjunatha | Douglas Oard | Philip Resnik
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Text segmentation aims to uncover latent structure by dividing text from a document into coherent sections. Where previous work on text segmentation considers the tasks of document segmentation and segment labeling separately, we show that the tasks contain complementary information and are best addressed jointly. We introduce Segment Pooling LSTM (S-LSTM), which is capable of jointly segmenting a document and labeling segments. In support of joint training, we develop a method for teaching the model to recover from errors by aligning the predicted and ground truth segments. We show that S-LSTM reduces segmentation error by 30% on average, while also improving segment labeling.

pdf bib
A Prioritization Model for Suicidality Risk Assessment
Han-Chin Shing | Philip Resnik | Douglas Oard
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We reframe suicide risk assessment from social media as a ranking problem whose goal is maximizing detection of severely at-risk individuals given the time available. Building on measures developed for resource-bounded document retrieval, we introduce a well founded evaluation paradigm, and demonstrate using an expert-annotated test collection that meaningful improvements over plausible cascade model baselines can be achieved using an approach that jointly ranks individuals and their social media posts.

2018

pdf bib
An Initial Test Collection for Ranked Retrieval of SMS Conversations
Rashmi Sankepally | Douglas W. Oard
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Knowledge Base Population for Organization Mentions in Email
Ning Gao | Mark Dredze | Douglas Oard
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

2015

pdf bib
Using Zero-Resource Spoken Term Discovery for Ranked Retrieval
Jerome White | Douglas Oard | Aren Jansen | Jiaul Paik | Rashmi Sankepally
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2013

pdf bib
Simulating Early-Termination Search for Verbose Spoken Queries
Jerome White | Douglas W. Oard | Nitendra Rajput | Marion Zalk
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
KELVIN: a tool for automated knowledge base construction
Paul McNamee | James Mayfield | Tim Finin | Tim Oates | Dawn Lawrie | Tan Xu | Douglas Oard
Proceedings of the 2013 NAACL HLT Demonstration Session

2012

pdf bib
Encouraging Consistent Translation Choices
Ferhan Ture | Douglas W. Oard | Philip Resnik
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Combining Statistical Translation Techniques for Cross-Language Information Retrieval
Ferhan Ture | Jimmy Lin | Douglas Oard
Proceedings of COLING 2012

pdf bib
Leveraging Statistical Transliteration for Dictionary-Based English-Bengali CLIR of OCR‘d Text
Utpal Garain | Arjun Das | David Doermann | Douglas Oard
Proceedings of COLING 2012: Posters

pdf bib
A Context-Aware Approach to Entity Linking
Veselin Stoyanov | James Mayfield | Tan Xu | Douglas Oard | Dawn Lawrie | Tim Oates | Tim Finin
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Creating and Curating a Cross-Language Person-Entity Linking Collection
Dawn Lawrie | James Mayfield | Paul McNamee | Douglas Oard
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

To stimulate research in cross-language entity linking, we present a new test collection for evaluating the accuracy of cross-language entity linking in twenty-one languages. This paper describes an efficient way to create and curate such a collection, judiciously exploiting existing language resources. Queries are created by semi-automatically identifying person names on the English side of a parallel corpus, using judgments obtained through crowdsourcing to identify the entity corresponding to the name, and projecting the English name onto the non-English document using word alignments. Name projections are then curated, again through crowdsourcing. This technique resulted in the first publicly available multilingual cross-language entity linking collection. The collection includes approximately 55,000 queries, comprising between 875 and 4,329 queries for each of twenty-one non-English languages.

2011

pdf bib
Cross-Language Entity Linking
Paul McNamee | James Mayfield | Dawn Lawrie | Douglas Oard | David Doermann
Proceedings of 5th International Joint Conference on Natural Language Processing

2009

pdf bib
Cross-Language Information Access: Looking Backward, Looking Forward
Douglas W. Oard
Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3)

pdf bib
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Mari Ostendorf | Michael Collins | Shri Narayanan | Douglas W. Oard | Lucy Vanderwende
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Phrase-Based Query Degradation Modeling for Vocabulary-Independent Ranked Utterance Retrieval
J. Scott Olsson | Douglas W. Oard
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Context-based Message Expansion for Disentanglement of Interleaved Text Conversations
Lidan Wang | Douglas W. Oard
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Mari Ostendorf | Michael Collins | Shri Narayanan | Douglas W. Oard | Lucy Vanderwende
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Arabic Cross-Document Coreference Resolution
Asad Sayeed | Tamer Elsayed | Nikesh Garera | David Alexander | Tan Xu | Doug Oard | David Yarowsky | Christine Piatko
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
Combining Speech Retrieval Results with Generalized Additive Models
J. Scott Olsson | Douglas W. Oard
Proceedings of ACL-08: HLT

pdf bib
Resolving Personal Names in Email Using Context Expansion
Tamer Elsayed | Douglas W. Oard | Galileo Namata
Proceedings of ACL-08: HLT

pdf bib
Pairwise Document Similarity in Large Collections with MapReduce
Tamer Elsayed | Jimmy Lin | Douglas Oard
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
Invited Talk: Lessons from the MALACH Project: Applying New Technologies to Improve Intellectual Access to Large Oral History Collections
Douglas W. Oard
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

2006

pdf bib
Investigating Cross-Language Speech Retrieval for a Spontaneous Conversational Speech Collection
Diana Inkpen | Muath Alzghool | Gareth Jones | Douglas Oard
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts
Chris Manning | Doug Oard | Jim Glass
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts

2003

pdf bib
Rapid-response machine translation for unexpected languages
Douglas W. Oard | Franz Josef Och
Proceedings of Machine Translation Summit IX: Papers

Statistical techniques for machine translation offer promise for rapid development in response to unexpected requirements, but realizing that potential requires rapid acquisition of required resources as well. This paper reports the results of experiments with resources collected in ten days; about 1.3 million words of parallel text from five types of sources and a bilingual term list with about 20,000 term pairs. Systems were trained with resources individually and in combination, using an approach based on alignment templates. The use of all available resources was found to yield the best results in an automatic evaluation using the BLEU measure, but a single resource (the Bible) coupled with a small amount of in-domain manual translation (less than 6,000 words) achieved more than 85% of that upper baseline. With a concerted effort, such a system could be built in a single day.

pdf bib
Multilingual Access to Large Spoken Archives (Invited talk)
Doug Oard
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Desparately Seeking Cebuano
Douglas W. Oard | David Doermann | Bonnie Dorr | Daqing He | Philip Resnik | Amy Weinberg | William Byrne | Sanjeev Khudanpur | David Yarowsky | Anton Leuski | Philipp Koehn | Kevin Knight
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

pdf bib
Information Retrieval Systems as Integration Platforms for Language Technologies
Douglas W. Oard
Companion Volume of the Proceedings of HLT-NAACL 2003 - Tutorial Abstracts

2001

pdf bib
Improved Cross-Language Retrieval using Backoff Translation
Philip Resnik | Douglas Oard | Gina Levow
Proceedings of the First International Conference on Human Language Technology Research

pdf bib
Mandarin-English Information: Investigating Translingual Speech Retrieval
Helen Meng | Berlin Chen | Sanjeev Khudanpur | Gina-Anne Levow | Wai-Kit Lo | Douglas Oard | Patrick Shone | Karen Tang | Hsin-Min Wang | Jianqiang Wang
Proceedings of the First International Conference on Human Language Technology Research

pdf bib
Rapidly Retargetable Interactive Translingual Retrieval
Gina-Anne Levow | Douglas W. Oard | Philip Resnik
Proceedings of the First International Conference on Human Language Technology Research

2000

pdf bib
Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval
Helen Meng | Sanjeev Khudanpur | Gina Levow | Douglas W. Oard | Hsin-Min Wang
ANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems

1998

pdf bib
A comparative study of query and document translation for cross-language information retrieval
Douglas W. Oard
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

Cross-language retrieval systems use queries in one natural language to guide retrieval of documents that might be written in another. Acquisition and representation of translation knowledge plays a central role in this process. This paper explores the utility of two sources of translation knowledge for cross-language retrieval. We have implemented six query translation techniques that use bilingual term lists and one based on direct use of the translation output from an existing machine translation system; these are compared with a document translation technique that uses output from the same machine translation system. Average precision measures on a TREC collection suggest that arbitrarily selecting a single dictionary translation is typically no less effective than using every translation in the dictionary, that query translation using a machine translation system can achieve somewhat better effectiveness than simpler techniques, and that document translation may result in further improvements in retrieval effectiveness under some conditions.