Maria Nǎdejde

Also published as: Maria Nadejde, Maria Nădejde


2024

pdf bib
M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation
Benjamin Hsu | Xiaoyu Liu | Huayang Li | Yoshinari Fujinuma | Maria Nadejde | Xing Niu | Ron Litman | Yair Kittenplon | Raghavendra Pappagari
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

Document translation poses a challenge for Neural Machine Translation (NMT) systems. Most document-level NMT systems rely on meticulously curated sentence-level parallel data, assuming flawless extraction of text from documents along with their precise reading order. These systems also tend to disregard additional visual cues such as the document layout, deeming it irrelevant. However, real-world documents often possess intricate text layouts that defy these assumptions. Extracting information from Optical Character Recognition (OCR) or heuristic rules can result in errors, and the layout (e.g., paragraphs, headers) may convey relationships between distant sections of text. This complexity is particularly evident in widely used PDF documents, which represent information visually. This paper addresses this gap by introducing M3T a novel benchmark dataset tailored to evaluate NMT systems on the comprehensive task of translating semi-structured documents. This dataset aims to bridge the evaluation gap in document-level NMT systems, acknowledging the challenges posed by rich text layouts in real-world applications.

2023

pdf bib
RAMP: Retrieval and Attribute-Marking Enhanced Prompting for Attribute-Controlled Translation
Gabriele Sarti | Phu Mon Htut | Xing Niu | Benjamin Hsu | Anna Currey | Georgiana Dinu | Maria Nadejde
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Attribute-controlled translation (ACT) is a subtask of machine translation that involves controlling stylistic or linguistic attributes (like formality and gender) of translation outputs. While ACT has garnered attention in recent years due to its usefulness in real-world applications, progress in the task is currently limited by dataset availability, since most prior approaches rely on supervised methods. To address this limitation, we propose Retrieval and Attribute-Marking enhanced Prompting (RAMP), which leverages large multilingual language models to perform ACT in few-shot and zero-shot settings. RAMP improves generation accuracy over the standard prompting approach by (1) incorporating a semantic similarity retrieval component for selecting similar in-context examples, and (2) marking in-context examples with attribute annotations. Our comprehensive experiments show that RAMP is a viable approach in both zero-shot and few-shot settings.

pdf bib
FINDINGS OF THE IWSLT 2023 EVALUATION CAMPAIGN
Milind Agarwal | Sweta Agrawal | Antonios Anastasopoulos | Luisa Bentivogli | Ondřej Bojar | Claudia Borg | Marine Carpuat | Roldano Cattoni | Mauro Cettolo | Mingda Chen | William Chen | Khalid Choukri | Alexandra Chronopoulou | Anna Currey | Thierry Declerck | Qianqian Dong | Kevin Duh | Yannick Estève | Marcello Federico | Souhir Gahbiche | Barry Haddow | Benjamin Hsu | Phu Mon Htut | Hirofumi Inaguma | Dávid Javorský | John Judge | Yasumasa Kano | Tom Ko | Rishu Kumar | Pengwei Li | Xutai Ma | Prashant Mathur | Evgeny Matusov | Paul McNamee | John P. McCrae | Kenton Murray | Maria Nadejde | Satoshi Nakamura | Matteo Negri | Ha Nguyen | Jan Niehues | Xing Niu | Atul Kr. Ojha | John E. Ortega | Proyag Pal | Juan Pino | Lonneke van der Plas | Peter Polák | Elijah Rippeth | Elizabeth Salesky | Jiatong Shi | Matthias Sperber | Sebastian Stüker | Katsuhito Sudoh | Yun Tang | Brian Thompson | Kevin Tran | Marco Turchi | Alex Waibel | Mingxuan Wang | Shinji Watanabe | Rodolfo Zevallos
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper reports on the shared tasks organized by the 20th IWSLT Conference. The shared tasks address 9 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, multilingual, dialect and low-resource speech translation, and formality control. The shared tasks attracted a total of 38 submissions by 31 teams. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia.

2022

pdf bib
MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation
Anna Currey | Maria Nadejde | Raghavendra Reddy Pappagari | Mia Mayer | Stanislas Lauly | Xing Niu | Benjamin Hsu | Georgiana Dinu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

As generic machine translation (MT) quality has improved, the need for targeted benchmarks that explore fine-grained aspects of quality has increased. In particular, gender accuracy in translation can have implications in terms of output fluency, translation accuracy, and ethics. In this paper, we introduce MT-GenEval, a benchmark for evaluating gender accuracy in translation from English into eight widely-spoken languages. MT-GenEval complements existing benchmarks by providing realistic, gender-balanced, counterfactual data in eight language pairs where the gender of individuals is unambiguous in the input segment, including multi-sentence segments requiring inter-sentential gender agreement. Our data and code is publicly available under a CC BY SA 3.0 license.

pdf bib
CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality
Maria Nadejde | Anna Currey | Benjamin Hsu | Xing Niu | Marcello Federico | Georgiana Dinu
Findings of the Association for Computational Linguistics: NAACL 2022

The machine translation (MT) task is typically formulated as that of returning a single translation for an input segment. However, in many cases, multiple different translations are valid and the appropriate translation may depend on the intended target audience, characteristics of the speaker, or even the relationship between speakers. Specific problems arise when dealing with honorifics, particularly translating from English into languages with formality markers. For example, the sentence “Are you sure?” can be translated in German as “Sind Sie sich sicher?” (formal register) or “Bist du dir sicher?” (informal). Using wrong or inconsistent tone may be perceived as inappropriate or jarring for users of certain cultures and demographics. This work addresses the problem of learning to control target language attributes, in this case formality, from a small amount of labeled contrastive data. We introduce an annotated dataset (CoCoA-MT) and an associated evaluation metric for training and evaluating formality-controlled MT models for six diverse target languages. We show that we can train formality-controlled models by fine-tuning on labeled contrastive data, achieving high accuracy (82% in-domain and 73% out-of-domain) while maintaining overall quality.

pdf bib
Findings of the IWSLT 2022 Evaluation Campaign
Antonios Anastasopoulos | Loïc Barrault | Luisa Bentivogli | Marcely Zanon Boito | Ondřej Bojar | Roldano Cattoni | Anna Currey | Georgiana Dinu | Kevin Duh | Maha Elbayad | Clara Emmanuel | Yannick Estève | Marcello Federico | Christian Federmann | Souhir Gahbiche | Hongyu Gong | Roman Grundkiewicz | Barry Haddow | Benjamin Hsu | Dávid Javorský | Vĕra Kloudová | Surafel Lakew | Xutai Ma | Prashant Mathur | Paul McNamee | Kenton Murray | Maria Nǎdejde | Satoshi Nakamura | Matteo Negri | Jan Niehues | Xing Niu | John Ortega | Juan Pino | Elizabeth Salesky | Jiatong Shi | Matthias Sperber | Sebastian Stüker | Katsuhito Sudoh | Marco Turchi | Yogesh Virkar | Alexander Waibel | Changhan Wang | Shinji Watanabe
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation. A total of 27 teams participated in at least one of the shared tasks. This paper details, for each shared task, the purpose of the task, the data that were released, the evaluation metrics that were applied, the submissions that were received and the results that were achieved.

2019

pdf bib
Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses
Courtney Napoles | Maria Nădejde | Joel Tetreault
Transactions of the Association for Computational Linguistics, Volume 7

Until now, grammatical error correction (GEC) has been primarily evaluated on text written by non-native English speakers, with a focus on student essays. This paper enables GEC development on text written by native speakers by providing a new data set and metric. We present a multiple-reference test corpus for GEC that includes 4,000 sentences in two new domains (formal and informal writing by native English speakers) and 2,000 sentences from a diverse set of non-native student writing. We also collect human judgments of several GEC systems on this new test set and perform a meta-evaluation, assessing how reliable automatic metrics are across these domains. We find that commonly used GEC metrics have inconsistent performance across domains, and therefore we propose a new ensemble metric that is robust on all three domains of text.

pdf bib
Personalizing Grammatical Error Correction: Adaptation to Proficiency Level and L1
Maria Nadejde | Joel Tetreault
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Grammar error correction (GEC) systems have become ubiquitous in a variety of software applications, and have started to approach human-level performance for some datasets. However, very little is known about how to efficiently personalize these systems to the user’s characteristics, such as their proficiency level and first language, or to emerging domains of text. We present the first results on adapting a general purpose neural GEC system to both the proficiency level and the first language of a writer, using only a few thousand annotated sentences. Our study is the broadest of its kind, covering five proficiency levels and twelve different languages, and comparing three different adaptation scenarios: adapting to the proficiency level only, to the first language only, or to both aspects simultaneously. We show that tailoring to both scenarios achieves the largest performance improvement (3.6 F0.5) relative to a strong baseline.

2017

pdf bib
Nematus: a Toolkit for Neural Machine Translation
Rico Sennrich | Orhan Firat | Kyunghyun Cho | Alexandra Birch | Barry Haddow | Julian Hitschler | Marcin Junczys-Dowmunt | Samuel Läubli | Antonio Valerio Miceli Barone | Jozef Mokry | Maria Nădejde
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We present Nematus, a toolkit for Neural Machine Translation. The toolkit prioritizes high translation accuracy, usability, and extensibility. Nematus has been used to build top-performing submissions to shared translation tasks at WMT and IWSLT, and has been used to train systems for production environments.

pdf bib
Predicting Target Language CCG Supertags Improves Neural Machine Translation
Maria Nădejde | Siva Reddy | Rico Sennrich | Tomasz Dwojak | Marcin Junczys-Dowmunt | Philipp Koehn | Alexandra Birch
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Modeling Selectional Preferences of Verbs and Nouns in String-to-Tree Machine Translation
Maria Nădejde | Alexandra Birch | Philipp Koehn
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

pdf bib
Edinburgh’s Statistical Machine Translation Systems for WMT16
Philip Williams | Rico Sennrich | Maria Nădejde | Matthias Huck | Barry Haddow | Ondřej Bojar
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
A Neural Verb Lexicon Model with Source-side Syntactic Context for String-to-Tree Machine Translation
Maria Nădejde | Alexandra Birch | Philipp Koehn
Proceedings of the 13th International Conference on Spoken Language Translation

String-to-tree MT systems translate verbs without lexical or syntactic context on the source side and with limited target-side context. The lack of context is one reason why verb translation recall is as low as 45.5%. We propose a verb lexicon model trained with a feed-forward neural network that predicts the target verb conditioned on a wide source-side context. We show that a syntactic context extracted from the dependency parse of the source sentence improves the model’s accuracy by 1.5% over a baseline trained on a window context. When used as an extra feature for re-ranking the n-best list produced by the string-to-tree MT system, the verb lexicon model improves verb translation recall by more than 7%.

2015

pdf bib
Edinburgh’s Syntax-Based Systems at WMT 2015
Philip Williams | Rico Sennrich | Maria Nadejde | Matthias Huck | Philipp Koehn
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf bib
EU-BRIDGE MT: Combined Machine Translation
Markus Freitag | Stephan Peitz | Joern Wuebker | Hermann Ney | Matthias Huck | Rico Sennrich | Nadir Durrani | Maria Nadejde | Philip Williams | Philipp Koehn | Teresa Herrmann | Eunah Cho | Alex Waibel
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Edinburgh’s Syntax-Based Systems at WMT 2014
Philip Williams | Rico Sennrich | Maria Nadejde | Matthias Huck | Eva Hasler | Philipp Koehn
Proceedings of the Ninth Workshop on Statistical Machine Translation

2013

pdf bib
The Feasibility of HMEANT as a Human MT Evaluation Metric
Alexandra Birch | Barry Haddow | Ulrich Germann | Maria Nadejde | Christian Buck | Philipp Koehn
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Edinburgh’s Syntax-Based Machine Translation Systems
Maria Nadejde | Philip Williams | Philipp Koehn
Proceedings of the Eighth Workshop on Statistical Machine Translation

Search
Co-authors