Richard Schwartz

Also published as: Rich Schwartz, R. Schwartz

2020

pdf bib
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)
Kathy McKeown | Douglas W. Oard | Elizabeth | Richard Schwartz
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)

pdf bib abs
Reformulating Information Retrieval from Speech and Text as a Detection Problem
Damianos Karakos | Rabih Zbib | William Hartmann | Richard Schwartz | John Makhoul
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)

In the IARPA MATERIAL program, information retrieval (IR) is treated as a hard detection problem; the system has to output a single global ranking over all queries, and apply a hard threshold on this global list to come up with all the hypothesized relevant documents. This means that how queries are ranked relative to each other can have a dramatic impact on performance. In this paper, we study such a performance measure, the Average Query Weighted Value (AQWV), which is a combination of miss and false alarm rates. AQWV requires that the same detection threshold is applied to all queries. Hence, detection scores of different queries should be comparable, and, to do that, a score normalization technique (commonly used in keyword spotting from speech) should be used. We describe unsupervised methods for score normalization, which are borrowed from the speech field and adapted accordingly for IR, and demonstrate that they greatly improve AQWV on the task of cross-language information retrieval (CLIR), on three low-resource languages used in MATERIAL. We also present a novel supervised score normalization approach which gives additional gains.

In this paper, we describe a cross-lingual information retrieval (CLIR) system that, given a query in English, and a set of audio and text documents in a foreign language, can return a scored list of relevant documents, and present findings in a summary form in English. Foreign audio documents are first transcribed by a state-of-the-art pretrained multilingual speech recognition model that is finetuned to the target language. For text documents, we use multiple multilingual neural machine translation (MT) models to achieve good translation results, especially for low/medium resource languages. The processed documents and queries are then scored using a probabilistic CLIR model that makes use of the probability of translation from GIZA translation tables and scores from a Neural Network Lexical Translation Model (NNLTM). Additionally, advanced score normalization, combination, and thresholding schemes are employed to maximize the Average Query Weighted Value (AQWV) scores. The CLIR output, together with multiple translation renderings, are selected and translated into English snippets via a summarization model. Our turnkey system is language agnostic and can be quickly trained for a new low-resource language in few days.

pdf bib abs
What Set of Documents to Present to an Analyst?
Richard Schwartz | John Makhoul | Lee Tarlin | Damianos Karakos
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)

We describe the human triage scenario envisioned in the Cross-Lingual Information Retrieval (CLIR) problem of the [REDUCT] Program. The overall goal is to maximize the quality of the set of documents that is given to a bilingual analyst, as measured by the AQWV score. The initial set of source documents that are retrieved by the CLIR system is summarized in English and presented to human judges who attempt to remove the irrelevant documents (false alarms); the resulting documents are then presented to the analyst. First, we describe the AQWV performance measure and show that, in our experience, if the acceptance threshold of the CLIR component has been optimized to maximize AQWV, the loss in AQWV due to false alarms is relatively constant across many conditions, which also limits the possible gain that can be achieved by any post filter (such as human judgments) that removes false alarms. Second, we analyze the likely benefits for the triage operation as a function of the initial CLIR AQWV score and the ability of the human judges to remove false alarms without removing relevant documents. Third, we demonstrate that we can increase the benefit for human judgments by combining the human judgment scores with the original document scores returned by the automatic CLIR system.

2012

2011

pdf bib
Improving Low-Resource Statistical Machine Translation with a Novel Semantic Word Clustering Algorithm
Jeff Ma | Spyros Matsoukas | Richard Schwartz
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Expected BLEU Training for Graphs: BBN System Description for WMT11 System Combination Task
Antti-Veikko Rosti | Bing Zhang | Spyros Matsoukas | Richard Schwartz
Proceedings of the Sixth Workshop on Statistical Machine Translation

2010

pdf bib
BBN System Description for WMT10 System Combination Task
Antti-Veikko Rosti | Bing Zhang | Spyros Matsoukas | Richard Schwartz
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Decision Trees for Lexical Smoothing in Statistical Machine Translation
Rabih Zbib | Spyros Matsoukas | Richard Schwartz | John Makhoul
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2009

pdf bib
Incremental Hypothesis Alignment with Flexible Matching for Building Confusion Networks: BBN System Description for WMT09 System Combination Task
Antti-Veikko Rosti | Bing Zhang | Spyros Matsoukas | Richard Schwartz
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric
Matthew Snover | Nitin Madnani | Bonnie Dorr | Richard Schwartz
Proceedings of the Fourth Workshop on Statistical Machine Translation

2008

pdf bib abs
Are Multiple Reference Translations Necessary? Investigating the Value of Paraphrased Reference Translations in Parameter Optimization
Nitin Madnani | Philip Resnik | Bonnie J. Dorr | Richard Schwartz
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

Most state-of-the-art statistical machine translation systems use log-linear models, which are defined in terms of hypothesis features and weights for those features. It is standard to tune the feature weights in order to maximize a translation quality metric, using held-out test sentences and their corresponding reference translations. However, obtaining reference translations is expensive. In our earlier work (Madnani et al., 2007), we introduced a new full-sentence paraphrase technique, based on English-to-English decoding with an MT system, and demonstrated that the resulting paraphrases can be used to cut the number of human reference translations needed in half. In this paper, we take the idea a step further, asking how far it is possible to get with just a single good reference translation for each item in the development set. Our analysis suggests that it is necessary to invest in four or more human translations in order to significantly improve on a single translation augmented by monolingual paraphrases.

pdf bib
Language and Translation Model Adaptation using Comparable Corpora
Matthew Snover | Bonnie Dorr | Richard Schwartz
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Incremental Hypothesis Alignment for Building Confusion Networks with Application to Machine Translation System Combination
Antti-Veikko Rosti | Bing Zhang | Spyros Matsoukas | Richard Schwartz
Proceedings of the Third Workshop on Statistical Machine Translation

2007

pdf bib
Combining Outputs from Multiple Machine Translation Systems
Antti-Veikko Rosti | Necip Fazil Ayan | Bing Xiang | Spyros Matsoukas | Richard Schwartz | Bonnie Dorr
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Improved Word-Level System Combination for Machine Translation
Antti-Veikko Rosti | Spyros Matsoukas | Richard Schwartz
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib abs
A Study of Translation Edit Rate with Targeted Human Annotation
Matthew Snover | Bonnie Dorr | Rich Schwartz | Linnea Micciulla | John Makhoul
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

We examine a new, intuitive measure for evaluating machine-translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments. Translation Edit Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation. We show that the single-reference variant of TER correlates as well with human judgments of MT quality as the four-reference variant of BLEU. We also define a human-targeted TER (or HTER) and show that it yields higher correlations with human judgments than BLEU—even when BLEU is given human-targeted references. Our results indicate that HTER correlates with human judgments better than HMETEOR and that the four-reference variants of TER and HTER correlate with human judgments as well as—or better than—a second human judgment does.