Rebecca Knowles

2025

MSLC25: Metric Performance on Low-Quality Machine Translation, Empty Strings, and Language Variants
Rebecca Knowles | Samuel Larkin | Chi-Kiu Lo
Proceedings of the Tenth Conference on Machine Translation

In this challenge set, we examine how automatic metrics for machine translation perform on a wide variety of machine translation output, covering a wider range of quality than the WMT submissions. We also explore metric results on specific types of corner cases, such as empty strings, wrong- or mixed-language text, and more. We primarily focus on Japanese–Chinese data, with some work on English and Czech.

pdf bib abs

NRC Systems for the WMT2025-LRSL Shared Task
Samuel Larkin | Chi-Kiu Lo | Rebecca Knowles
Proceedings of the Tenth Conference on Machine Translation

We describe the NRC team systems for the WMT25 Shared Tasks on Large Language Models (LLMs) with Limited Resources for Slavic Languages. We participate in the Lower Sorbian and Upper Sorbian Machine Translation and Question Answering tasks. On the machine translation tasks, our primary focus, our systems rank first according to the automatic MT evaluation metric (chrF). Our systems underperform on the QA tasks.

2024

pdf bib abs

Evaluation Briefs: Drawing on Translation Studies for Human Evaluation of MT
Ting Liu | Chi-kiu Lo | Elizabeth Marshman | Rebecca Knowles
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

In this position paper, we examine ways in which researchers in machine translation and translation studies have approached the problem of evaluating the output of machine translation systems and, more broadly, the questions of what it means to define translation quality. We explore their similarities and differences, highlighting the role that the purpose and context of translation plays in translation studies approaches. We argue that evaluation of machine translation (e.g., in shared tasks) would benefit from additional insights from translation studies, and we suggest the introduction of an ‘evaluation brief” (analogous to the ‘translation brief’) which could help set out useful context for annotators tasked with evaluating machine translation.

pdf bib

Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
Rebecca Knowles | Akiko Eriguchi | Shivali Goel
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

pdf bib abs

Some Tradeoffs in Continual Learning for Parliamentary Neural Machine Translation Systems
Rebecca Knowles | Samuel Larkin | Michel Simard | Marc A Tessier | Gabriel Bernier-Colborne | Cyril Goutte | Chi-kiu Lo
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

In long-term translation projects, like Parliamentary text, there is a desire to build machine translation systems that can adapt to changes over time. We implement and examine a simple approach to continual learning for neural machine translation, exploring tradeoffs between consistency, the model’s ability to learn from incoming data, and the time a client would need to wait to obtain a newly trained translation system.

pdf bib abs

MSLC24: Further Challenges for Metrics on a Wide Landscape of Translation Quality
Rebecca Knowles | Samuel Larkin | Chi-Kiu Lo
Proceedings of the Ninth Conference on Machine Translation

In this second edition of the Metric Score Landscape Challenge (MSLC), we examine how automatic metrics for machine translation perform on a wide variety of machine translation output, ranging from very low quality systems to the types of high-quality systems submitted to the General MT shared task at WMT. We also explore metric results on specific types of data, such as empty strings, wrong- or mixed-language text, and more. We raise several alarms about inconsistencies in metric scores, some of which can be resolved by increasingly explicit instructions for metric use, while others highlight technical flaws.

pdf bib abs

MSLC24 Submissions to the General Machine Translation Task
Samuel Larkin | Chi-Kiu Lo | Rebecca Knowles
Proceedings of the Ninth Conference on Machine Translation

The MSLC (Metric Score Landscape Challenge) submissions for English-German, English-Spanish, and Japanese-Chinese are constrained systems built using Transformer models for the purpose of better evaluating metric performance in the WMT24 Metrics Task. They are intended to be representative of the performance of systems that can be built relatively simply using constrained data and with minimal modifications to the translation training pipeline.

2023

pdf bib abs

Metric Score Landscape Challenge (MSLC23): Understanding Metrics’ Performance on a Wider Landscape of Translation Quality
Chi-kiu Lo | Samuel Larkin | Rebecca Knowles
Proceedings of the Eighth Conference on Machine Translation

The Metric Score Landscape Challenge (MSLC23) dataset aims to gain insight into metric scores on a broader/wider landscape of machine translation (MT) quality. It provides a collection of low- to medium-quality MT output on the WMT23 general task test set. Together with the high quality systems submitted to the general task, this will enable better interpretation of metric scores across a range of different levels of translation quality. With this wider range of MT quality, we also visualize and analyze metric characteristics beyond just correlation.

pdf bib abs

Terminology in Neural Machine Translation: A Case Study of the Canadian Hansard
Rebecca Knowles | Samuel Larkin | Marc Tessier | Michel Simard
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

Incorporating terminology into a neural machine translation (NMT) system is a feature of interest for many users of machine translation. In this case study of English-French Canadian Parliamentary text, we examine the performance of standard NMT systems at handling terminology and consider the tradeoffs between potential performance improvements and the efforts required to maintain terminological resources specifically for NMT.

pdf bib abs

We develop an interactive web-based user interface for performing textspeech alignment and creating digital interactive “read-along audio books that highlight words as they are spoken and allow users to replay individual words when clicked. We build on an existing Python library for zero-shot multilingual textspeech alignment (Littell et al., 2022), extend it by exposing its functionality through a RESTful API, and rewrite the underlying speech recognition engine to run in the browser. The ReadAlong Studio Web App is open-source, user-friendly, prioritizes privacy and data sovereignty, allows for a variety of standard export formats, and is designed to work for the majority of the world’s languages.

pdf bib abs

Long to reign over us: A Case Study of Machine Translation and a New Monarch
Rebecca Knowles | Samuel Larkin
Findings of the Association for Computational Linguistics: ACL 2023

Novel terminology and changes in terminology are often a challenge for machine translation systems. The passing of Queen Elizabeth II and the accession of King Charles III provide a striking example of translation shift in the real world, particularly in translation contexts that have ambiguity. Examining translation between French and English, we present a focused case-study of translations about King Charles III as produced both by publicly-available MT systems and by a neural machine translation system trained specifically on Canadian parliamentary text. We find that even in cases where human translators would have adequate context to disambiguate terms from the source language, machine translation systems do not always produce the expected output. Where we are able to analyze the training data, we note that this may represent artifacts in the data, raising important questions about machine translation updates in light of real world events.

pdf bib abs

Beyond Correlation: Making Sense of the Score Differences of New MT Evaluation Metrics
Chi-kiu Lo | Rebecca Knowles | Cyril Goutte
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

While many new automatic metrics for machine translation evaluation have been proposed in recent years, BLEU scores are still used as the primary metric in the vast majority of MT research papers. There are many reasons that researchers may be reluctant to switch to new metrics, from external pressures (reviewers, prior work) to the ease of use of metric toolkits. Another reason is a lack of intuition about the meaning of novel metric scores. In this work, we examine “rules of thumb” about metric score differences and how they do (and do not) correspond to human judgments of statistically significant differences between systems. In particular, we show that common rules of thumb about BLEU score differences do not in fact guarantee that human annotators will find significant differences between systems. We also show ways in which these rules of thumb fail to generalize across translation directions or domains.

pdf bib abs

Data Sampling and (In)stability in Machine Translation Evaluation
Chi-kiu Lo | Rebecca Knowles
Findings of the Association for Computational Linguistics: ACL 2023

We analyze the different data sampling approaches used in selecting data for human evaluation and ranking of machine translation systems at the highly influential Conference on Machine Translation (WMT). By using automatic evaluation metrics, we are able to focus on the impact of the data sampling procedure as separate from questions about human annotator consistency. We provide evidence that the latest data sampling approach used at WMT skews the annotated data toward shorter documents, not necessarily representative of the full test set. Lastly, we examine a new data sampling method that uses the available labour budget to sample data in a more representative manner, with the goals of improving representation of various document lengths in the sample and producing more stable rankings of system translation quality.

2022

pdf bib abs

Test Set Sampling Affects System Rankings: Expanded Human Evaluation of WMT20 English-Inuktitut Systems
Rebecca Knowles | Chi-kiu Lo
Proceedings of the Seventh Conference on Machine Translation (WMT)

We present a collection of expanded human annotations of the WMT20 English-Inuktitut machine translation shared task, covering the Nunavut Hansard portion of the dataset. Additionally, we recompute News rankings to take into account the completed set of human annotations and certain irregularities in the annotation task construction. We show the effect of these changes on the downstream task of the evaluation of automatic metrics. Finally, we demonstrate that character-level metrics correlate well with human judgments for the task of automatically evaluating translation into this polysynthetic language.

This paper presents the results of the General Machine Translation Task organised as part of the Conference on Machine Translation (WMT) 2022. In the general MT task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting of four different domains. We evaluate system outputs with human annotators using two different techniques: reference-based direct assessment and (DA) and a combination of DA and scalar quality metric (DA+SQM).

pdf bib abs

Translation Memories as Baselines for Low-Resource Machine Translation
Rebecca Knowles | Patrick Littell
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Low-resource machine translation research often requires building baselines to benchmark estimates of progress in translation quality. Neural and statistical phrase-based systems are often used with out-of-the-box settings to build these initial baselines before analyzing more sophisticated approaches, implicitly comparing the first machine translation system to the absence of any translation assistance. We argue that this approach overlooks a basic resource: if you have parallel text, you have a translation memory. In this work, we show that using available text as a translation memory baseline against which to compare machine translation systems is simple, effective, and can shed light on additional translation challenges.

2021

pdf bib abs

NRC-CNRC Systems for Upper Sorbian-German and Lower Sorbian-German Machine Translation 2021
Rebecca Knowles | Samuel Larkin
Proceedings of the Sixth Conference on Machine Translation

We describe our neural machine translation systems for the 2021 shared task on Unsupervised and Very Low Resource Supervised MT, translating between Upper Sorbian and German (low-resource) and between Lower Sorbian and German (unsupervised). The systems incorporated data filtering, backtranslation, BPE-dropout, ensembling, and transfer learning from high(er)-resource languages. As measured by automatic metrics, our systems showed strong performance, consistently placing first or tied for first across most metrics and translation directions.

pdf bib abs

NRC-CNRC Machine Translation Systems for the 2021 AmericasNLP Shared Task
Rebecca Knowles | Darlene Stewart | Samuel Larkin | Patrick Littell
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas

We describe the NRC-CNRC systems submitted to the AmericasNLP shared task on machine translation. We submitted systems translating from Spanish into Wixárika, Nahuatl, Rarámuri, and Guaraní. Our best neural machine translation systems used multilingual pretraining, ensembling, finetuning, training on parts of the development data, and subword regularization. We also submitted translation memory systems as a strong baseline.

pdf bib abs

On the Stability of System Rankings at WMT
Rebecca Knowles
Proceedings of the Sixth Conference on Machine Translation

The current approach to collecting human judgments of machine translation quality for the news translation task at WMT – segment rating with document context – is the most recent in a sequence of changes to WMT human annotation protocol. As these annotation protocols have changed over time, they have drifted away from some of the initial statistical assumptions underpinning them, with consequences that call the validity of WMT news task system rankings into question. In simulations based on real data, we show that the rankings can be influenced by the presence of outliers (high- or low-quality systems), resulting in different system rankings and clusterings. We also examine questions of annotation task composition and how ease or difficulty of translating different documents may influence system rankings. We provide discussion of ways to analyze these issues when considering future changes to annotation protocols.

pdf bib abs

Like Chalk and Cheese? On the Effects of Translationese in MT Training
Samuel Larkin | Michel Simard | Rebecca Knowles
Proceedings of Machine Translation Summit XVIII: Research Track

We revisit the topic of translation direction in the data used for training neural machine translation systems and focusing on a real-world scenario with known translation direction and imbalances in translation direction: the Canadian Hansard. According to automatic metrics and we observe that using parallel data that was produced in the “matching” translation direction (Authentic source and translationese target) improves translation quality. In cases of data imbalance in terms of translation direction and we find that tagging of translation direction can close the performance gap. We perform a human evaluation that differs slightly from the automatic metrics and but nevertheless confirms that for this French-English dataset that is known to contain high-quality translations and authentic or tagged mixed source improves over translationese source for training.

2020

This paper surveys the first, three-year phase of a project at the National Research Council of Canada that is developing software to assist Indigenous communities in Canada in preserving their languages and extending their use. The project aimed to work within the empowerment paradigm, where collaboration with communities and fulfillment of their goals is central. Since many of the technologies we developed were in response to community needs, the project ended up as a collection of diverse subprojects, including the creation of a sophisticated framework for building verb conjugators for highly inflectional polysynthetic languages (such as Kanyen’kéha, in the Iroquoian language family), release of what is probably the largest available corpus of sentences in a polysynthetic language (Inuktut) aligned with English sentences and experiments with machine translation (MT) systems trained on this corpus, free online services based on automatic speech recognition (ASR) for easing the transcription bottleneck for recordings of speech in Indigenous languages (and other languages), software for implementing text prediction and read-along audiobooks for Indigenous languages, and several other subprojects.

pdf bib abs

The Inuktitut language, a member of the Inuit-Yupik-Unangan language family, is spoken across Arctic Canada and noted for its morphological complexity. It is an official language of two territories, Nunavut and the Northwest Territories, and has recognition in additional regions. This paper describes a newly released sentence-aligned Inuktitut–English corpus based on the proceedings of the Legislative Assembly of Nunavut, covering sessions from April 1999 to June 2017. With approximately 1.3 million aligned sentence pairs, this is, to our knowledge, the largest parallel corpus of a polysynthetic language or an Indigenous language of the Americas released to date. The paper describes the alignment methodology used, the evaluation of the alignments, and preliminary experiments on statistical and neural machine translation (SMT and NMT) between Inuktitut and English, in both directions.

pdf bib abs

NRC Systems for Low Resource German-Upper Sorbian Machine Translation 2020: Transfer Learning with Lexical Modifications
Rebecca Knowles | Samuel Larkin | Darlene Stewart | Patrick Littell
Proceedings of the Fifth Conference on Machine Translation

We describe the National Research Council of Canada (NRC) neural machine translation systems for the German-Upper Sorbian supervised track of the 2020 shared task on Unsupervised MT and Very Low Resource Supervised MT. Our models are ensembles of Transformer models, built using combinations of BPE-dropout, lexical modifications, and backtranslation.

pdf bib abs

NRC Systems for the 2020 Inuktitut-English News Translation Task
Rebecca Knowles | Darlene Stewart | Samuel Larkin | Patrick Littell
Proceedings of the Fifth Conference on Machine Translation

We describe the National Research Council of Canada (NRC) submissions for the 2020 Inuktitut-English shared task on news translation at the Fifth Conference on Machine Translation (WMT20). Our submissions consist of ensembled domain-specific finetuned transformer models, trained using the Nunavut Hansard and news data and, in the case of Inuktitut-English, backtranslated news and parliamentary data. In this work we explore challenges related to the relatively small amount of parallel data, morphological complexity, and domain shifts.

Despite recent advances in natural language processing and other language technology, the application of such technology to language documentation and conservation has been limited. In August 2019, a workshop was held at Carnegie Mellon University in Pittsburgh, PA, USA to attempt to bring together language community members, documentary linguists, and technologists to discuss how to bridge this gap and create prototypes of novel and practical language revitalization technologies. The workshop focused on developing technologies to aid language documentation and revitalization in four areas: 1) spoken language (speech transcription, phone to orthography decoding, text-to-speech and text-speech forced alignment), 2) dictionary extraction and management, 3) search tools for corpora, and 4) social media (language learning bots and social media analysis). This paper reports the results of this workshop, including issues discussed, and various conceived and implemented technologies for nine languages: Arapaho, Cayuga, Inuktitut, Irish Gaelic, Kidaw’ida, Kwak’wala, Ojibwe, San Juan Quiahije Chatino, and Seneca.

2019

pdf bib abs

HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation
Brian Thompson | Rebecca Knowles | Xuan Zhang | Huda Khayrallah | Kevin Duh | Philipp Koehn
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Bilingual lexicons are valuable resources used by professional human translators. While these resources can be easily incorporated in statistical machine translation, it is unclear how to best do so in the neural framework. In this work, we present the HABLex dataset, designed to test methods for bilingual lexicon integration into neural machine translation. Our data consists of human generated alignments of words and phrases in machine translation test sets in three language pairs (Russian-English, Chinese-English, and Korean-English), resulting in clean bilingual lexicons which are well matched to the reference. We also present two simple baselines - constrained decoding and continued training - and an improvement to continued training to address overfitting.

2018

pdf bib

A Comparison of Machine Translation Paradigms for Use in Black-Box Fuzzy-Match Repair
Rebecca Knowles | John Ortega | Philipp Koehn
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing

pdf bib abs

Document-Level Adaptation for Neural Machine Translation
Sachith Sri Ram Kothur | Rebecca Knowles | Philipp Koehn
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

It is common practice to adapt machine translation systems to novel domains, but even a well-adapted system may be able to perform better on a particular document if it were to learn from a translator’s corrections within the document itself. We focus on adaptation within a single document – appropriate for an interactive translation scenario where a model adapts to a human translator’s input over the course of a document. We propose two methods: single-sentence adaptation (which performs online adaptation one sentence at a time) and dictionary adaptation (which specifically addresses the issue of translating novel words). Combining the two models results in improvements over both approaches individually, and over baseline systems, even on short documents. On WMT news test data, we observe an improvement of +1.8 BLEU points and +23.3% novel word translation accuracy and on EMEA data (descriptions of medications) we observe an improvement of +2.7 BLEU points and +49.2% novel word translation accuracy.

pdf bib abs

Context and Copying in Neural Machine Translation
Rebecca Knowles | Philipp Koehn
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Neural machine translation systems with subword vocabularies are capable of translating or copying unknown words. In this work, we show that they learn to copy words based on both the context in which the words appear as well as features of the words themselves. In contexts that are particularly copy-prone, they even copy words that they have already learned they should translate. We examine the influence of context and subword features on this and other types of copying behavior.

pdf bib

Lightweight Word-Level Confidence Estimation for Neural Interactive Translation Prediction
Rebecca Knowles | Philipp Koehn
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing

2017

pdf bib abs

A Rich Morphological Tagger for English: Exploring the Cross-Linguistic Tradeoff Between Morphology and Syntax
Christo Kirov | John Sylak-Glassman | Rebecca Knowles | Ryan Cotterell | Matt Post
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

A traditional claim in linguistics is that all human languages are equally expressive—able to convey the same wide range of meanings. Morphologically rich languages, such as Czech, rely on overt inflectional and derivational morphology to convey many semantic distinctions. Languages with comparatively limited morphology, such as English, should be able to accomplish the same using a combination of syntactic and contextual cues. We capitalize on this idea by training a tagger for English that uses syntactic features obtained by automatic parsing to recover complex morphological tags projected from Czech. The high accuracy of the resulting model provides quantitative confirmation of the underlying linguistic hypothesis of equal expressivity, and bodes well for future improvements in downstream HLT tasks including machine translation.

pdf bib abs

Six Challenges for Neural Machine Translation
Philipp Koehn | Rebecca Knowles
Proceedings of the First Workshop on Neural Machine Translation

We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search. We show both deficiencies and improvements over the quality of phrase-based statistical machine translation.

2016

pdf bib

User Modeling in Language Learning with Macaronic Texts
Adithya Renduchintala | Rebecca Knowles | Philipp Koehn | Jason Eisner
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib

Analyzing Learner Understanding of Novel L2 Vocabulary
Rebecca Knowles | Adithya Renduchintala | Philipp Koehn | Jason Eisner
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

pdf bib

Creating Interactive Macaronic Interfaces for Language Learning
Adithya Renduchintala | Rebecca Knowles | Philipp Koehn | Jason Eisner
Proceedings of ACL-2016 System Demonstrations

pdf bib abs

Neural Interactive Translation Prediction
Rebecca Knowles | Philipp Koehn
Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track

We present an interactive translation prediction method based on neural machine translation. Even with the same translation quality of the underlying machine translation systems, the neural prediction method yields much higher word prediction accuracy (61.6% vs. 43.3%) than the traditional method based on search graphs, mainly due to better recovery from errors. We also develop efficient means to enable practical deployment.

pdf bib

Demographer: Extremely Simple Name Demographics
Rebecca Knowles | Josh Carroll | Mark Dredze
Proceedings of the First Workshop on NLP and Computational Social Science

Rebecca Knowles

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2014

2013

Co-authors

Venues