Marianna Martindale

Also published as: Marianna J. Martindale

2025

As Machine Translation (MT) becomes increasingly commonplace, understanding how the general public perceives and relies on imperfect MT is crucial for contextualizing MT research in real-world applications. We present a human study conducted in a public museum (n=452), investigating how fluency and adequacy errors impact bilingual and non-bilingual users’ reliance on MT during casual use. Our findings reveal that non-bilingual users often over-rely on MT due to a lack of evaluation strategies and alternatives, while experiencing the impact of errors can prompt users to reassess future reliance. This highlights the need for MT evaluation and NLP explanation techniques to promote not only MT quality, but also MT literacy among its users.

pdf bib abs
Improving MT-enabled Triage Performance with Multiple MT Outputs
Marianna J. Martindale | Marine Carpuat
Proceedings of Machine Translation Summit XX: Volume 1

Recent advances in Machine Translation (MT) quality may motivate adoption in a variety of use cases, but the success of MT deployment depends not only on intrinsic model quality but on how well the model, as deployed, helps users meet the objectives of their use case. This work focuses on a specific triage use case, MT-enabled scanning in intelligence analysis. After describing the use case with its objectives and failure modes, we present a user study to establish a baseline performance level and measure the mitigating effects of a simple intervention, providing additional MT outputs. We find significant improvements in relevance judgment accuracy with outputs from two distinct neural MT models and significant improvements in relevant entity identification with the addition of a rule-based MT. Users also like seeing multiple MT outputs, making it an appealing way to improve MT-enabled scanning performance.

2024

pdf bib
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations)
Marianna Martindale | Janice Campbell | Konstantin Savenkov | Shivali Goel
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations)

2023

pdf bib abs
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection
Weijia Xu | Sweta Agrawal | Eleftheria Briakou | Marianna J. Martindale | Marine Carpuat
Transactions of the Association for Computational Linguistics, Volume 11

Neural sequence generation models are known to “hallucinate”, by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds.

2021

pdf bib abs
Machine Translation Believability
Marianna Martindale | Kevin Duh | Marine Carpuat
Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing

Successful Machine Translation (MT) deployment requires understanding not only the intrinsic qualities of MT output, such as fluency and adequacy, but also user perceptions. Users who do not understand the source language respond to MT output based on their perception of the likelihood that the meaning of the MT output matches the meaning of the source text. We refer to this as believability. Output that is not believable may be off-putting to users, but believable MT output with incorrect meaning may mislead them. In this work, we study the relationship of believability to fluency and adequacy by applying traditional MT direct assessment protocols to annotate all three features on the output of neural MT systems. Quantitative analysis of these annotations shows that believability is closely related to but distinct from fluency, and initial qualitative analysis suggests that semantic features may account for the difference.

Stylistic variations of language, such as formality, carry speakers’ intention beyond literal meaning and should be conveyed adequately in translation. We propose to use lexical formality models to control the formality level of machine translation output. We demonstrate the effectiveness of our approach in empirical evaluations, as measured by automatic metrics and human assessments.

2016

bib
MoJo: Bringing Hybrid MT to the Center for Applied Machine Translation
Marianna Martindale
Conferences of the Association for Machine Translation in the Americas: MT Users' Track

2015

pdf bib
Class-based N-gram language difference models for data selection
Amittai Axelrod | Yogarshi Vyas | Marianna Martindale | Marine Carpuat
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers

2012

pdf bib abs
Can Statistical Post-Editing with a Small Parallel Corpus Save a Weak MT Engine?
Marianna J. Martindale
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Statistical post-editing has been shown in several studies to increase BLEU score for rule-based MT systems. However, previous studies have relied solely on BLEU and have not conducted further study to determine whether those gains indicated an increase in quality or in score alone. In this work we conduct a human evaluation of statistical post-edited output from a weak rule-based MT system, comparing the results with the output of the original rule-based system and a phrase-based statistical MT system trained on the same data. We show that for this weak rule-based system, despite significant BLEU score increases, human evaluators prefer the output of the original system. While this is not a generally conclusive condemnation of statistical post-editing, this result does cast doubt on the efficacy of statistical post-editing for weak MT systems and on the reliability of BLEU score for comparison between weak rule-based and hybrid systems built from them.