2024
pdf
bib
abs
Evaluating End-to-End Speech-to-Speech Translation for Dubbing: Challenges and New Metrics
Fred Bane
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations)
The advent of end-to-end speech-to-speech translation (S2ST) systems in recent years marks a significant advancement over traditional cascaded approaches. These novel systems represent a direct translation pathway from spoken input to spoken output without relying on intermediate text forms. However, evaluation methods for this task, such as ASR BLEU, are often still compartmentalized and text-based. We suggest the quality of the resulting speech must be measured too. Naturalness, similarity of the target voice to the original, reflection of accents, and rhythm are all important. We argue that new evaluation metrics are needed in response to this watershed change. Our presentation approaches this topic through the lens of dubbing, with a particular focus on voice over. We begin with a critical examination of existing metrics. Then we discuss key features of S2ST that are inadequately captured. Finally, we propose new directions for evaluation of S2ST systems.
pdf
bib
abs
LLMs in Post-Translation Workflows: Comparing Performance in Post-Editing and Error Analysis
Celia Uguet
|
Fred Bane
|
Mahmoud Aymo
|
João Torres
|
Anna Zaretskaya
|
Tània Blanch Miró Blanch Miró
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
This study conducts a comprehensive comparison of three leading LLMs—GPT-4, Claude 3, and Gemini—in two translation-related tasks: automatic post-editing and MQM error annotation, across four languages. Utilizing the pharmaceutical EMEA corpus to maintain domain specificity and minimize data contamination, the research examines the models’ performance in these two tasks. Our findings reveal the nuanced capabilities of LLMs in handling MTPE and MQM tasks, hinting at the potential of these models in streamlining and optimizing translation workflows. Future directions include fine-tuning LLMs for task-specific improvements and exploring the integration of style guides for enhanced translation quality.
2023
pdf
bib
abs
Coming to Terms with Glossary Enforcement: A Study of Three Approaches to Enforcing Terminology in NMT
Fred Bane
|
Anna Zaretskaya
|
Tània Blanch Miró
|
Celia Soler Uguet
|
João Torres
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
Enforcing terminology constraints is less straight-forward in neural machine translation (NMT) than statistical machine translation. Current methods, such as alignment-based insertion or the use of factors or special tokens, each have their strengths and drawbacks. We describe the current state of research on terminology enforcement in transformer-based NMT models, and present the results of our investigation into the performance of three different approaches. In addition to reference based quality metrics, we also evaluate the linguistic quality of the translations thus produced. Our results show that each approach is effective, though a negative impact on translation fluency remains evident.
pdf
bib
abs
Enhancing Gender Representation in Neural Machine Translation: A Comparative Analysis of Annotating Strategies for English-Spanish and English-Polish Language Pairs
Celia Soler Uguet
|
Fred Bane
|
Mahmoud Aymo
|
João Pedro Fernandes Torres
|
Anna Zaretskaya
|
Tània Blanch Miró
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track
Machine translation systems have been shown to demonstrate gender bias (Savoldi et al., 2021; Stafanovičs et al., 2020; Stanovsky et al., 2020), and contribute to this bias with systematically unfair translations. In this presentation, we explore a method of enforcing gender in NMT. We generalize the method proposed by Vincent et al. (2022) to create training data not requiring a first-person speaker. Drawing from other works that use special tokens to pass additional information to NMT systems (e.g. Ailem et al., 2021), we annotate the training data with special tokens to mark the gender of a given noun in the text, which enables the NMT system to produce the correct gender during translation. These tokens are also used to mark the gender in a source sentence at inference time. However, in production scenarios, gender is often unknown at inference time, so we propose two methods of leveraging language models to obtain these labels. Our experiment is set up in a fine-tuning scenario, adapting an existing translation model with gender-annotated data. We focus on the English to Spanish and Polish language pairs. Without guidance, NMT systems often ignore signals that indicate the correct gender for translation. To this end, we consider two methods of annotating the source English sentence for gender, such as the noun developer in the following sentence: The developer argued with the designer because she did not like the design. a) We use a coreference resolution model based on SpanBERT (Joshi et al., 2020) to connect any gender-indicating pronouns to their head nouns. b) We use the GPT-3.5 model prompted to identify the gender of each person in the sentence based on the context within the sentence. For test data, we use a collection of sentences from Stanovsky et al. including two professions and one pronoun that can refer only to one of them. We use the above two methods to annotate the source sentence we want to translate, produce the translations with our fine-tuned model and compare the accuracy of the gender translation in both cases. The correctness of the gender was evaluated by professional linguists. Overall, we observed a significant improvement in gender translations compared to the baseline (a 7% improvement for Spanish and a 50% improvement for Polish), with SpanBERT outperforming GPT on this task. The Polish MT model still struggles to produce the correct gender (even the translations produced with the ‘gold truth’ gender markings are only correct in 56% of the cases). We discuss limitations to this method. Our research is intended as a reference for fellow MT practitioners, as it offers a comparative analysis of two practical implementations that show the potential to enhance the accuracy of gender in translation, thereby elevating the overall quality of translation and mitigating gender bias.
2022
pdf
bib
abs
A Comparison of Data Filtering Methods for Neural Machine Translation
Fred Bane
|
Celia Soler Uguet
|
Wiktor Stribiżew
|
Anna Zaretskaya
Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)
With the increasing availability of large-scale parallel corpora derived from web crawling and bilingual text mining, data filtering is becoming an increasingly important step in neural machine translation (NMT) pipelines. This paper applies several available tools to the task of data filtration, and compares their performance in filtering out different types of noisy data. We also study the effect of filtration with each tool on model performance in the downstream task of NMT by creating a dataset containing a combination of clean and noisy data, filtering the data with each tool, and training NMT engines using the resulting filtered corpora. We evaluate the performance of each engine with a combination of direct assessment (DA) and automated metrics. Our best results are obtained by training for a short time on all available data then filtering the corpus with cross-entropy filtering and training until convergence.
pdf
bib
abs
Comparing Multilingual NMT Models and Pivoting
Celia Soler Uguet
|
Fred Bane
|
Anna Zaretskaya
|
Tània Blanch Miró
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
Following recent advancements in multilingual machine translation at scale, our team carried out tests to compare the performance of multilingual models (M2M from Facebook and multilingual models from Helsinki-NLP) with a two-step translation process using English as a pivot language. Direct assessment by linguists rated translations produced by pivoting as consistently better than those obtained from multilingual models of similar size, while automated evaluation with COMET suggested relative performance was strongly impacted by domain and language family.
2021
pdf
bib
abs
Selecting the best data filtering method for NMT training
Fred Bane
|
Anna Zaretskaya
Proceedings of Machine Translation Summit XVIII: Users and Providers Track
Performance of NMT systems has been proven to depend on the quality of the training data. In this paper we explore different open-source tools that can be used to score the quality of translation pairs, with the goal of obtaining clean corpora for training NMT models. We measure the performance of these tools by correlating their scores with human scores, as well as rank models trained on the resulting filtered datasets in terms of their performance on different test sets and MT performance metrics.
pdf
bib
abs
System Description for Transperfect
Wiktor Stribiżew
|
Fred Bane
|
José Conceição
|
Anna Zaretskaya
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
In this paper, we describe our participation in the 2021 Workshop on Asian Translation (team ID: tpt_wat). We submitted results for all six directions of the JPC2 patent task. As a first-time participant in the task, we attempted to identify a single configuration that provided the best overall results across all language pairs. All our submissions were created using single base transformer models, trained on only the task-specific data, using a consistent configuration of hyperparameters. In contrast to the uniformity of our methods, our results vary widely across the six language pairs.