2024
pdf
bib
abs
RCnum: A Semantic and Multilingual Online Edition of the Geneva Council Registers from 1545 to 1550
Pierrette Bouillon
|
Christophe Chazalon
|
Sandra Coram-Mekkey
|
Gilles Falquet
|
Johanna Gerlach
|
Stephane Marchand-Maillet
|
Laurent Moccozet
|
Jonathan Mutal
|
Raphael Rubino
|
Marco Sorbi
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2)
The RCnum project is funded by the Swiss National Science Foundation and aims at producing a multilingual and semantically rich online edition of the Registers of Geneva Council from 1545 to 1550. Combining multilingual NLP, history and paleography, this collaborative project will clear hurdles inherent to texts manually written in 16th century Middle French while allowing for easy access and interactive consultation of these archives.
pdf
bib
abs
Normalizing without Modernizing: Keeping Historical Wordforms of Middle French while Reducing Spelling Variants
Raphael Rubino
|
Johanna Gerlach
|
Jonathan Mutal
|
Pierrette Bouillon
Findings of the Association for Computational Linguistics: NAACL 2024
Conservation of historical documents benefits from computational methods by alleviating the manual labor related to digitization and modernization of textual content. Languages usually evolve over time and keeping historical wordforms is crucial for diachronic studies and digital humanities. However, spelling conventions did not necessarily exist when texts were originally written and orthographic variations are commonly observed depending on scribes and time periods. In this study, we propose to automatically normalize orthographic wordforms found in historical archives written in Middle French during the 16th century without fully modernizing textual content. We leverage pre-trained models in a low resource setting based on a manually curated parallel corpus and produce additional resources with artificial data generation approaches. Results show that causal language models and knowledge distillation improve over a strong baseline, thus validating the proposed methods.
pdf
bib
abs
A Concept Based Approach for Translation of Medical Dialogues into Pictographs
Johanna Gerlach
|
Pierrette Bouillon
|
Jonathan Mutal
|
Hervé Spechbach
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Pictographs have been found to improve patient comprehension of medical information or instructions. However, tools to produce pictograph representations from natural language are still scarce. In this contribution we describe a system that automatically translates French speech into pictographs to enable diagnostic interviews in emergency settings, thereby providing a tool to overcome the language barrier or provide support in Augmentative and Alternative Communication (AAC) contexts. Our approach is based on a semantic gloss that serves as pivot between spontaneous language and pictographs, with medical concepts represented using the UMLS ontology. In this study we evaluate different available pre-trained models fine-tuned on artificial data to translate French into this semantic gloss. On unseen data collected in real settings, consisting of questions and instructions by physicians, the best model achieves an F0.5 score of 86.7. A complementary human evaluation of the semantic glosses differing from the reference shows that 71% of these would be usable to transmit the intended meaning. Finally, a human evaluation of the pictograph sequences derived from the gloss reveals very few additions, omissions or order issues (<3%), suggesting that the gloss as designed is well suited as a pivot for translation into pictographs.
pdf
bib
abs
TIM-UNIGE Translation into Low-Resource Languages of Spain for WMT24
Jonathan Mutal
|
Lucía Ormaechea
Proceedings of the Ninth Conference on Machine Translation
We present the results of our constrained submission to the WMT 2024 shared task, which focuses on translating from Spanish into two low-resource languages of Spain: Aranese (spa-arn) and Aragonese (spa-arg). Our system integrates real and synthetic data generated by large language models (e.g., BLOOMZ) and rule-based Apertium translation systems. Built upon the pre-trained NLLB system, our translation model utilizes a multistage approach, progressively refining the initial model through the sequential use of different datasets, starting with large-scale synthetic or crawled data and advancing to smaller, high-quality parallel corpora. This approach resulted in BLEU scores of 30.1 for Spanish to Aranese and 61.9 for Spanish to Aragonese.
2023
pdf
bib
abs
PROPICTO: Developing Speech-to-Pictograph Translation Systems to Enhance Communication Accessibility
Lucía Ormaechea
|
Pierrette Bouillon
|
Maximin Coavoux
|
Emmanuelle Esperança-Rodier
|
Johanna Gerlach
|
Jerôme Goulian
|
Benjamin Lecouteux
|
Cécile Macaire
|
Jonathan Mutal
|
Magali Norré
|
Adrien Pupier
|
Didier Schwab
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
PROPICTO is a project funded by the French National Research Agency and the Swiss National Science Foundation, that aims at creating Speech-to-Pictograph translation systems, with a special focus on French as an input language. By developing such technologies, we intend to enhance communication access for non-French speaking patients and people with cognitive impairments.
pdf
bib
Evaluating a Multilingual Pre-trained Model for the Automatic Standard German captioning of Swiss German TV
Johanna Gerlach
|
Pierrette Bouillon
|
Silvia Rodríguez Vázquez
|
Jonathan Mutal
|
Marianne Starlander
Proceedings of the 8th edition of the Swiss Text Analytics Conference
2022
pdf
bib
abs
A Neural Machine Translation Approach to Translate Text to Pictographs in a Medical Speech Translation System - The BabelDr Use Case
Jonathan Mutal
|
Pierrette Bouillon
|
Magali Norré
|
Johanna Gerlach
|
Lucia Ormaechea Grijalba
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
The use of images has been shown to positively affect patient comprehension in medical settings, in particular to deliver specific medical instructions. However, tools that automatically translate sentences into pictographs are still scarce due to the lack of resources. Previous studies have focused on the translation of sentences into pictographs by using WordNet combined with rule-based approaches and deep learning methods. In this work, we showed how we leveraged the BabelDr system, a speech to speech translator for medical triage, to build a speech to pictograph translator using UMLS and neural machine translation approaches. We showed that the translation from French sentences to a UMLS gloss can be viewed as a machine translation task and that a Multilingual Neural Machine Translation system achieved the best results.
pdf
bib
abs
The PASSAGE project : Standard German Subtitling of Swiss German TV content
Pierrette Bouillon
|
Johanna Gerlach
|
Jonathan Mutal
|
Marianne Starlander
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
We present the PASSAGE project, which aims at automatic Standard German subtitling of Swiss German TV content. This is achieved in a two step process, beginning with ASR to produce a normalised transcription, followed by translation into Standard German. We focus on the second step, for which we explore different approaches and contribute aligned corpora for future research.
pdf
bib
abs
Producing Standard German Subtitles for Swiss German TV Content
Johanna Gerlach
|
Jonathan Mutal
|
Bouillon Pierrette
Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022)
In this study we compare two approaches (neural machine translation and edit-based) and the use of synthetic data for the task of translating normalised Swiss German ASR output into correct written Standard German for subtitles, with a special focus on syntactic differences. Results suggest that NMT is better suited to this task and that relatively simple rule-based generation of training data could be a valuable approach for cases where little training data is available and transformations are simple.
2021
pdf
bib
abs
A Speech-enabled Fixed-phrase Translator for Healthcare Accessibility
Pierrette Bouillon
|
Johanna Gerlach
|
Jonathan Mutal
|
Nikos Tsourakis
|
Hervé Spechbach
Proceedings of the 1st Workshop on NLP for Positive Impact
In this overview article we describe an application designed to enable communication between health practitioners and patients who do not share a common language, in situations where professional interpreters are not available. Built on the principle of a fixed phrase translator, the application implements different natural language processing (NLP) technologies, such as speech recognition, neural machine translation and text-to-speech to improve usability. Its design allows easy portability to new domains and integration of different types of output for multiple target audiences. Even though BabelDr is far from solving the problem of miscommunication between patients and doctors, it is a clear example of NLP in a real world application designed to help minority groups to communicate in a medical context. It also gives some insights into the relevant criteria for the development of such an application.
2020
bib
COPECO: a Collaborative Post-Editing Corpus in Pedagogical Context
Jonathan Mutal
|
Pierrette Bouillon
|
Perrine Schumacher
|
Johanna Gerlach
Proceedings of 1st Workshop on Post-Editing in Modern-Day Translation
pdf
bib
abs
Ellipsis Translation for a Medical Speech to Speech Translation System
Jonathan Mutal
|
Johanna Gerlach
|
Pierrette Bouillon
|
Hervé Spechbach
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
In diagnostic interviews, elliptical utterances allow doctors to question patients in a more efficient and economical way. However, literal translation of such incomplete utterances is rarely possible without affecting communication. Previous studies have focused on automatic ellipsis detection and resolution, but only few specifically address the problem of automatic translation of ellipsis. In this work, we evaluate four different approaches to translate ellipsis in medical dialogues in the context of the speech to speech translation system BabelDr. We also investigate the impact of training data, using an under-sampling method and data with elliptical utterances in context. Results show that the best model is able to translate 88% of elliptical utterances.
pdf
bib
abs
Re-design of the Machine Translation Training Tool (MT3)
Paula Estrella
|
Emiliano Cuenca
|
Laura Bruno
|
Jonathan Mutal
|
Sabrina Girletti
|
Lise Volkart
|
Pierrette Bouillon
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
We believe that machine translation (MT) must be introduced to translation students as part of their training, in preparation for their professional life. In this paper we present a new version of the tool called MT3, which builds on and extends a joint effort undertaken by the Faculty of Languages of the University of Córdoba and Faculty of Translation and Interpreting of the University of Geneva to develop an open-source web platform to teach MT to translation students. We also report on a pilot experiment with the goal of testing the viability of using MT3 in an MT course. The pilot let us identify areas for improvement and collect students’ feedback about the tool’s usability.
2019
pdf
bib
Monolingual backtranslation in a medical speech translation system for diagnostic interviews - a NMT approach
Jonathan Mutal
|
Pierrette Bouillon
|
Johanna Gerlach
|
Paula Estrella
|
Hervé Spechbach
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks
pdf
bib
abs
Differences between SMT and NMT Output - a Translators’ Point of View
Jonathan Mutal
|
Lise Volkart
|
Pierrette Bouillon
|
Sabrina Girletti
|
Paula Estrella
Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)
In this study, we compare the output quality of two MT systems, a statistical (SMT) and a neural (NMT) engine, customised for Swiss Post’s Language Service using the same training data. We focus on the point of view of professional translators and investigate how they perceive the differences between the MT output and a human reference (namely deletions, substitutions, insertions and word order). Our findings show that translators more frequently consider these differences to be errors in SMT than NMT, and that deletions are the most serious errors in both architectures. We also observe lower agreement on differences to be corrected in NMT than in SMT, suggesting that errors are easier to identify in SMT. These findings confirm the ability of NMT to produce correct paraphrases, which could also explain why BLEU is often considered as an inadequate metric to evaluate the performance of NMT systems.
2018
pdf
bib
abs
Integrating MT at Swiss Post’s Language Service: preliminary results
Pierrette Bouillon
|
Sabrina Girletti
|
Paula Estrella
|
Jonathan Mutal
|
Martina Bellodi
|
Beatrice Bircher
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
This paper presents the preliminary results of an ongoing academia-industry collaboration that aims to integrate MT into the workflow of Swiss Post’s Language Service. We describe the evaluations carried out to select an MT tool (commercial or open-source) and assess the suitability of machine translation for post-editing in Swiss Post’s various subject areas and language pairs. The goal of this first phase is to provide recommendations with regard to the tool, language pair and most suitable domain for implementing MT.