Pierrette Bouillon

Also published as: P. Bouillon

2025

pdf bib abs
Leveraging Large Language Models for Joint Linguistic and Technical Accessibility Improvement: A Case Study on University Webpages
Pierrette Bouillon | Johanna Gerlach | Raphael Rubino
Proceedings of the 1st Workshop on Artificial Intelligence and Easy and Plain Language in Institutional Contexts (AI & EL/PL)

The aim of the study presented in this paper is to investigate whether Large Language Models can be leveraged to translate French content from existing websites into their B1-level simplified versions and to integrate them into an accessible HTML structure. We design a CMS agnostic approach to webpage accessibility improvement based on prompt engineering and apply it to Geneva University webpages. We conduct several automatic and manual evaluations to measure the accessibility improvement reached by several LLMs with various prompts in a zero-shot setting. Results show that LLMs are not all suitable for the task, while a large disparity is observed among results reached by different prompts. Manual evaluation carried out by a dyslexic crowd shows that some LLMs could produce more accessible websites and improve access to information.

pdf bib abs
PaSCo1: A Parallel Video-SiGML Swiss French Sign Language Corpus in Medical Domain
Bastien David | Pierrette Bouillon | Jonathan Mutal | Irene Strasly | Johanna Gerlach | Hervé Spechbach
Proceedings of the Third International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)

This article introduces the parallel sign language translation corpus, PaSCo1, developed as part of the BabelDr project, an automatic speech translation system for medical triage. PaSCo1 aims to make a set of medical data available in Swiss French Sign Language (LSF-CH) in the form of both videos signed by a human and their description in G-SiGML mark-up language. We describe the beginnings of the corpus as part of the BabelDr project, as well as the methodology used to create the videos and generate the G-SiGML language using the SiGLA platform. The resulting FAIR corpus comprises 2 031 medical questions and instructions in the form of videos and G-SiGML code.

pdf bib abs
Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?
Perla Al Almaoui | Pierrette Bouillon | Simon Hengchen
Proceedings of Machine Translation Summit XX: Volume 2

In an era of rapid technological advancements, communication continues to evolve as new linguistic phenomena emerge. Among these is Arabizi, a hybrid form of Arabic that incorporates Latin characters and numbers to represent the spoken dialects of Arab communities. Arabizi is Widely used on social media and allows people to communicate in an informal and dynamic way, but it poses significant challenges for machine translation due to its lack of formal structure and deeply embedded cultural nuances. This case study is motivated by a growing need to translate Arabizi for gisting purpose. It evaluates the capacity of different LLMs’ to decode and translate Arabizi, focusing on multiple Arabic dialects that have rarely been studied up until now. Using a combination of human evaluators and automatic metrics, this research project investigates the model’s performance in translating Arabizi into both Modern Standard Arabic and English. Key questions explored include which dialects are translated most effectively and whether translations into English surpass those into Arabic.

pdf bib abs
Factors Affecting Translation Quality in In-context Learning for Multilingual Medical Domain
Jonathan Mutal | Raphael Rubino | Pierrette Bouillon
Proceedings of the Tenth Conference on Machine Translation

Multilingual machine translation in the medical domain presents critical challenges due to limited parallel data, domain-specific terminology, and the high stakes associated with translation accuracy. In this paper, we explore the potential of in-context learning (ICL) with general-purpose large language models (LLMs) as an alternative to fine-tuning. Focusing on the medical domain and low-resource languages, we evaluate an instruction-tuned LLM on a translation task across 16 languages. We address four research questions centered on prompt design, examining the impact of the number of examples, the domain and register of examples, and the example selection strategy. Our results show that prompting with one to three examples from the same register and domain as the test input leads to the largest improvements in translation quality, as measured by automatic metrics, while translation quality gains plateau with an increased number of examples. Furthermore, we find that example selection methods - lexical and embedding based - do not yield significant benefits over random selection if the register of selected examples does not match that of the test input.

2024

pdf bib abs
Improving Sign Language Production in the Healthcare Domain Using UMLS and Multi-task Learning
Jonathan David Mutal | Raphael Rubino | Pierrette Bouillon | Bastien David | Johanna Gerlach | Irene Strasly
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024

This paper presents a study on Swiss-French sign language production in the medical domain. In emergency care settings, a lack of clear communication can interfere with accurate delivery of health related services. For patients communicating with sign language, equal access to healthcare remains an issue. While previous work has explored producing sign language gloss from a source text, we propose to extend this approach to produce a multichannel sign language output given a written French input. Furthermore, we extend our approach with a multi-task framework allowing us to include the Unified Medical Language System (UMLS) in our model. Results show that the introduction of UMLS in the training data improves model accuracy by 13.64 points.

pdf bib abs
Simplification Strategies in French Spontaneous Speech
Lucía Ormaechea | Nikos Tsourakis | Didier Schwab | Pierrette Bouillon | Benjamin Lecouteux
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024

Automatic Text Simplification (ATS) aims at rewriting texts into simpler variants while preserving their original meaning, so they can be more easily understood by different audiences. While ATS has been widely used for written texts, its application to spoken language remains unexplored, even if it is not exempt from difficulty. This study aims to characterize the edit operations performed in order to simplify French transcripts for non-native speakers. To do so, we relied on a data sample randomly extracted from the Orféo-CEFC French spontaneous speech dataset. In the absence of guidelines to direct this process, we adopted an intuitive simplification approach, so as to investigate the crafted simplifications based on expert linguists’ criteria, and to compare them with those produced by a generative AI (namely, ChatGPT). The results, analyzed quantitatively and qualitatively, reveal that the most common edits are deletions, and affect oral production aspects, like restarts or hesitations. Consequently, candidate simplifications are typically register-standardized sentences that solely include the propositional content of the input. The study also examines the alignment between human- and machine-based simplifications, revealing a moderate level of agreement, and highlighting the subjective nature of the task. The findings contribute to understanding the intricacies of simplifying spontaneous spoken language. In addition, the provision of a small-scale parallel dataset derived from such expert simplifications, Propicto-Orféo-Simple, can facilitate the evaluation of speech simplification solutions.

pdf bib abs
Post-editors as Gatekeepers of Lexical and Syntactic Diversity: Comparative Analysis of Human Translation and Post-editing in Professional Settings
Lise Volkart | Pierrette Bouillon
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

This paper presents a comparative analysis between human translation (HT) and post-edited machine translation (PEMT) from a lexical and syntactic perspective to verify whether the tendency of neural machine translation (NMT) systems to produce lexically and syntactically poorer translations shines through after post-editing (PE). The analysis focuses on three datasets collected in professional contexts containing translations from English into French and German into French. Through a comparison of word translation entropy (HTRa) scores, we observe a lower degree of lexical diversity in PEMT compared to HT. Additionally, metrics of syntactic equivalence indicate that PEMT is more likely to mirror the syntactic structure of the source text in contrast to HT. By incorporating raw machine translation (MT) output into our analysis, we underline the important role post-editors play in adding lexical and syntactic diversity to MT output. Our findings provide relevant input for MT users and decision-makers in language services as well as for MT and PE trainers and advisers.

The RCnum project is funded by the Swiss National Science Foundation and aims at producing a multilingual and semantically rich online edition of the Registers of Geneva Council from 1545 to 1550. Combining multilingual NLP, history and paleography, this collaborative project will clear hurdles inherent to texts manually written in 16th century Middle French while allowing for easy access and interactive consultation of these archives.

pdf bib abs
Normalizing without Modernizing: Keeping Historical Wordforms of Middle French while Reducing Spelling Variants
Raphael Rubino | Johanna Gerlach | Jonathan Mutal | Pierrette Bouillon
Findings of the Association for Computational Linguistics: NAACL 2024

Conservation of historical documents benefits from computational methods by alleviating the manual labor related to digitization and modernization of textual content. Languages usually evolve over time and keeping historical wordforms is crucial for diachronic studies and digital humanities. However, spelling conventions did not necessarily exist when texts were originally written and orthographic variations are commonly observed depending on scribes and time periods. In this study, we propose to automatically normalize orthographic wordforms found in historical archives written in Middle French during the 16th century without fully modernizing textual content. We leverage pre-trained models in a low resource setting based on a manually curated parallel corpus and produce additional resources with artificial data generation approaches. Results show that causal language models and knowledge distillation improve over a strong baseline, thus validating the proposed methods.

Les modèles de langue préentraînés (PLM) constituent aujourd’hui de facto l’épine dorsale de la plupart des systèmes de traitement automatique des langues. Dans cet article, nous présentons Jargon, une famille de PLMs pour des domaines spécialisés du français, en nous focalisant sur trois domaines : la parole transcrite, le domaine clinique / biomédical, et le domaine juridique. Nous utilisons une architecture de transformeur basée sur des méthodes computationnellement efficaces(LinFormer) puisque ces domaines impliquent souvent le traitement de longs documents. Nous évaluons et comparons nos modèles à des modèles de l’état de l’art sur un ensemble varié de tâches et de corpus d’évaluation, dont certains sont introduits dans notre article. Nous rassemblons les jeux de données dans un nouveau référentiel d’évaluation en langue française pour ces trois domaines. Nous comparons également diverses configurations d’entraînement : préentraînement prolongé en apprentissage autosupervisé sur les données spécialisées, préentraînement à partir de zéro, ainsi que préentraînement mono et multi-domaines. Nos expérimentations approfondies dans des domaines spécialisés montrent qu’il est possible d’atteindre des performances compétitives en aval, même lors d’un préentraînement avec le mécanisme d’attention approximatif de LinFormer. Pour une reproductibilité totale, nous publions les modèles et les données de préentraînement, ainsi que les corpus utilisés.

pdf bib abs
A Concept Based Approach for Translation of Medical Dialogues into Pictographs
Johanna Gerlach | Pierrette Bouillon | Jonathan Mutal | Hervé Spechbach
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Pictographs have been found to improve patient comprehension of medical information or instructions. However, tools to produce pictograph representations from natural language are still scarce. In this contribution we describe a system that automatically translates French speech into pictographs to enable diagnostic interviews in emergency settings, thereby providing a tool to overcome the language barrier or provide support in Augmentative and Alternative Communication (AAC) contexts. Our approach is based on a semantic gloss that serves as pivot between spontaneous language and pictographs, with medical concepts represented using the UMLS ontology. In this study we evaluate different available pre-trained models fine-tuned on artificial data to translate French into this semantic gloss. On unseen data collected in real settings, consisting of questions and instructions by physicians, the best model achieves an F0.5 score of 86.7. A complementary human evaluation of the semantic glosses differing from the reference shows that 71% of these would be usable to transmit the intended meaning. Finally, a human evaluation of the pictograph sequences derived from the gloss reveals very few additions, omissions or order issues (<3%), suggesting that the gloss as designed is well suited as a pivot for translation into pictographs.

Pretrained Language Models (PLMs) are the de facto backbone of most state-of-the-art NLP systems. In this paper, we introduce a family of domain-specific pretrained PLMs for French, focusing on three important domains: transcribed speech, medicine, and law. We use a transformer architecture based on efficient methods (LinFormer) to maximise their utility, since these domains often involve processing long documents. We evaluate and compare our models to state-of-the-art models on a diverse set of tasks and datasets, some of which are introduced in this paper. We gather the datasets into a new French-language evaluation benchmark for these three domains. We also compare various training configurations: continued pretraining, pretraining from scratch, as well as single- and multi-domain pretraining. Extensive domain-specific experiments show that it is possible to attain competitive downstream performance even when pre-training with the approximative LinFormer attention mechanism. For full reproducibility, we release the models and pretraining data, as well as contributed datasets.

pdf bib abs
Automatic Normalisation of Middle French and Its Impact on Productivity
Raphael Rubino | Sandra Coram-Mekkey | Johanna Gerlach | Jonathan David Mutal | Pierrette Bouillon
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024

This paper presents a study on automatic normalisation of 16th century documents written in Middle French. These documents present a large variety of wordforms which require spelling normalisation to facilitate downstream linguistic and historical studies. We frame the normalisation process as a machine translation task starting with a strong baseline leveraging a pre-trained encoder–decoder model. We propose to improve this baseline by combining synthetic data generation methods and producing artificial training data, thus tackling the lack of parallel corpora relevant to our task. The evaluation of our approach is twofold, in addition to automatic metrics relying on gold references, we evaluate our models through post-editing of their outputs. This evaluation method directly measures the productivity gain brought by our models to experts conducting the normalisation task manually. Results show a 20+ token per minute increase in productivity when using automatic normalisation compared to normalising text from scratch. The manually post-edited dataset resulting from our study is the first parallel corpus of normalised 16th century Middle French to be publicly released, along with the synthetic data and the automatic normalisation models used and trained in the presented work.

2023

PROPICTO is a project funded by the French National Research Agency and the Swiss National Science Foundation, that aims at creating Speech-to-Pictograph translation systems, with a special focus on French as an input language. By developing such technologies, we intend to enhance communication access for non-French speaking patients and people with cognitive impairments.

pdf bib
Simple, Simpler and Beyond: A Fine-Tuning BERT-Based Approach to Enhance Sentence Complexity Assessment for Text Simplification
Lucía Ormaechea | Nikos Tsourakis | Didier Schwab | Pierrette Bouillon | Benjamin Lecouteux
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)

pdf bib abs
Evaluating the Impact of Stereotypes and Language Combinations on Gender Bias Occurrence in NMT Generic Systems
Bertille Triboulet | Pierrette Bouillon
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Machine translation, and more specifically neural machine translation (NMT), have been proven to be subject to gender bias in recent years. Many studies have focused on evaluating and reducing this phenomenon, mainly through the analysis of occupational nouns’ translation for the same type of language combinations. In this paper, we reproduce a similar test set than in previous studies to investigate the influence of stereotypes and language combinations’ nature (formed with English, French and Italian) on gender bias occurrence in NMT. Similarly to previous studies, we confirm stereotypes as a major source of gender bias, especially in female contexts, while observing bias even in language combinations traditionally less examined.

pdf bib abs
Improving Standard German Captioning of Spoken Swiss German: Evaluating Multilingual Pre-trained Models
Jonathan David Mutal | Pierrette Bouillon | Johanna Gerlach | Marianne Starlander
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track

Multilingual pre-trained language models are often the best alternative in low-resource settings. In the context of a cascade architecture for automatic Standard German captioning of spoken Swiss German, we evaluate different models on the task of transforming normalised Swiss German ASR output into Standard German. Instead of training a large model from scratch, we fine-tuned publicly available pre-trained models, which reduces the cost of training high-quality neural machine translation models. Results show that pre-trained multilingual models achieve the highest scores, and that a higher number of languages included in pre-training improves the performance. We also observed that the type of source and target included in fine-tuning data impacts the results.

pdf bib
Evaluating a Multilingual Pre-trained Model for the Automatic Standard German captioning of Swiss German TV
Johanna Gerlach | Pierrette Bouillon | Silvia Rodríguez Vázquez | Jonathan Mutal | Marianne Starlander
Proceedings of the 8th edition of the Swiss Text Analytics Conference

2022

pdf bib abs
A Neural Machine Translation Approach to Translate Text to Pictographs in a Medical Speech Translation System - The BabelDr Use Case
Jonathan Mutal | Pierrette Bouillon | Magali Norré | Johanna Gerlach | Lucía Ormaechea Grijalba
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

The use of images has been shown to positively affect patient comprehension in medical settings, in particular to deliver specific medical instructions. However, tools that automatically translate sentences into pictographs are still scarce due to the lack of resources. Previous studies have focused on the translation of sentences into pictographs by using WordNet combined with rule-based approaches and deep learning methods. In this work, we showed how we leveraged the BabelDr system, a speech to speech translator for medical triage, to build a speech to pictograph translator using UMLS and neural machine translation approaches. We showed that the translation from French sentences to a UMLS gloss can be viewed as a machine translation task and that a Multilingual Neural Machine Translation system achieved the best results.

pdf bib abs
Studying Post-Editese in a Professional Context: A Pilot Study
Lise Volkart | Pierrette Bouillon
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

The past few years have seen the multiplication of studies on post-editese, following the massive adoption of post-editing in professional translation workflows. These studies mainly rely on the comparison of post-edited machine translation and human translation on artificial parallel corpora. By contrast, we investigate here post-editese on comparable corpora of authentic translation jobs for the language direction English into French. We explore commonly used scores and also proposes the use of a novel metric. Our analysis shows that post-edited machine translation is not only lexically poorer than human translation, but also less dense and less varied in terms of translation solutions. It also tends to be more prolific than human translation for our language direction. Finally, our study highlights some of the challenges of working with comparable corpora in post-editese research.

pdf bib abs
The PASSAGE project : Standard German Subtitling of Swiss German TV content
Pierrette Bouillon | Johanna Gerlach | Jonathan Mutal | Marianne Starlander
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

We present the PASSAGE project, which aims at automatic Standard German subtitling of Swiss German TV content. This is achieved in a two step process, beginning with ASR to produce a normalised transcription, followed by translation into Standard German. We focus on the second step, for which we explore different approaches and contribute aligned corpora for future research.

pdf bib abs
Standard German Subtitling of Swiss German TV content: the PASSAGE Project
Jonathan David Mutal | Pierrette Bouillon | Johanna Gerlach | Veronika Haberkorn
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In Switzerland, two thirds of the population speak Swiss German, a primarily spoken language with no standardised written form. It is widely used on Swiss TV, for example in news reports, interviews or talk shows, and subtitles are required for people who cannot understand this spoken language. This paper focuses on the task of automatic Standard German subtitling of spoken Swiss German, and more specifically on the translation of a normalised Swiss German speech recognition result into Standard German suitable for subtitles. Our contribution consists of a comparison of different statistical and deep learning MT systems for this task and an aligned corpus of normalised Swiss German and Standard German subtitles. Results of two evaluations, automatic and human, show that the systems succeed in improving the content, but are currently not capable of producing entirely correct Standard German.

pdf bib abs
Producing Standard German Subtitles for Swiss German TV Content
Johanna Gerlach | Jonathan Mutal | Pierrette Bouillon
Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022)

In this study we compare two approaches (neural machine translation and edit-based) and the use of synthetic data for the task of translating normalised Swiss German ASR output into correct written Standard German for subtitles, with a special focus on syntactic differences. Results suggest that NMT is better suited to this task and that relatively simple rule-based generation of training data could be a valuable approach for cases where little training data is available and transformations are simple.

pdf bib abs
Investigating the Medical Coverage of a Translation System into Pictographs for Patients with an Intellectual Disability
Magali Norré | Vincent Vandeghinste | Thomas François | Pierrette Bouillon
Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022)

Communication between physician and patients can lead to misunderstandings, especially for disabled people. An automatic system that translates natural language into a pictographic language is one of the solutions that could help to overcome this issue. In this preliminary study, we present the French version of a translation system using the Arasaac pictographs and we investigate the strategies used by speech therapists to translate into pictographs. We also evaluate the medical coverage of this tool for translating physician questions and patient instructions.

2021

bib abs
Using speech technology in the translation process workflow in international organizations: A quantitative and qualitative study
Pierrette Bouillon | Jeevanthi Liyanapathirana
Proceedings of Machine Translation Summit XVIII: Users and Providers Track

In international organizations, the growing demand for translations has increased the need for post-editing. Different studies show that automatic speech recognition systems have the potential to increase the productivity of the translation process as well as the quality. In this talk, we will explore the possibilities of using speech in the translation process by conducting a post-editing experiment with three professional translators in an international organization. Our experiment consisted of comparing three translation methods: speaking the translation with MT as an inspiration (RESpeaking), post-editing the MT suggestions by typing (PE), and editing the MT suggestion using speech (SPE). BLEU and HTER scores were used to compare the three methods. Our study shows that translators did more edits under condition RES, whereas in SPE, the resulting translations were closer to the reference according to the BLEU score and required less edits. Time taken to translate was the least in SPE followed by PE, RES methods and the translators preferred using speech to typing. These results show the potential of speech when it is coupled with post-editing. To the best of our knowledge, this is the first quantitative study conducted on using post-editing and speech together in large scale international organizations.

pdf bib abs
A Speech-enabled Fixed-phrase Translator for Healthcare Accessibility
Pierrette Bouillon | Johanna Gerlach | Jonathan Mutal | Nikos Tsourakis | Hervé Spechbach
Proceedings of the 1st Workshop on NLP for Positive Impact

In this overview article we describe an application designed to enable communication between health practitioners and patients who do not share a common language, in situations where professional interpreters are not available. Built on the principle of a fixed phrase translator, the application implements different natural language processing (NLP) technologies, such as speech recognition, neural machine translation and text-to-speech to improve usability. Its design allows easy portability to new domains and integration of different types of output for multiple target audiences. Even though BabelDr is far from solving the problem of miscommunication between patients and doctors, it is a clear example of NLP in a real world application designed to help minority groups to communicate in a medical context. It also gives some insights into the relevant criteria for the development of such an application.

pdf bib abs
Extending a Text-to-Pictograph System to French and to Arasaac
Magali Norré | Vincent Vandeghinste | Pierrette Bouillon | Thomas François
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

We present an adaptation of the Text-to-Picto system, initially designed for Dutch, and extended to English and Spanish. The original system, aimed at people with an intellectual disability, automatically translates text into pictographs (Sclera and Beta). We extend it to French and add a large set of Arasaac pictographs linked to WordNet 3.1. To carry out this adaptation, we automatically link the pictographs and their metadata to synsets of two French WordNets and leverage this information to translate words into pictographs. We automatically and manually evaluate our system with different corpora corresponding to different use cases, including one for medical communication between doctors and patients. The system is also compared to similar systems in other languages.

2020

bib
COPECO: a Collaborative Post-Editing Corpus in Pedagogical Context
Jonathan Mutal | Pierrette Bouillon | Perrine Schumacher | Johanna Gerlach
Proceedings of 1st Workshop on Post-Editing in Modern-Day Translation

pdf bib abs
Ellipsis Translation for a Medical Speech to Speech Translation System
Jonathan Mutal | Johanna Gerlach | Pierrette Bouillon | Hervé Spechbach
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

In diagnostic interviews, elliptical utterances allow doctors to question patients in a more efficient and economical way. However, literal translation of such incomplete utterances is rarely possible without affecting communication. Previous studies have focused on automatic ellipsis detection and resolution, but only few specifically address the problem of automatic translation of ellipsis. In this work, we evaluate four different approaches to translate ellipsis in medical dialogues in the context of the speech to speech translation system BabelDr. We also investigate the impact of training data, using an under-sampling method and data with elliptical utterances in context. Results show that the best model is able to translate 88% of elliptical utterances.

We believe that machine translation (MT) must be introduced to translation students as part of their training, in preparation for their professional life. In this paper we present a new version of the tool called MT3, which builds on and extends a joint effort undertaken by the Faculty of Languages of the University of Córdoba and Faculty of Translation and Interpreting of the University of Geneva to develop an open-source web platform to teach MT to translation students. We also report on a pilot experiment with the goal of testing the viability of using MT³ in an MT course. The pilot let us identify areas for improvement and collect students’ feedback about the tool’s usability.

2019

pdf bib
Surveying the potential of using speech technologies for post-editing purposes in the context of international organizations: What do professional translators think?
Jeevanthi Liyanapathirana | Pierrette Bouillon | Bartolomé Mesa-Lao
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

pdf bib
Monolingual backtranslation in a medical speech translation system for diagnostic interviews - a NMT approach
Jonathan Mutal | Pierrette Bouillon | Johanna Gerlach | Paula Estrella | Hervé Spechbach
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

pdf bib abs
Differences between SMT and NMT Output - a Translators’ Point of View
Jonathan Mutal | Lise Volkart | Pierrette Bouillon | Sabrina Girletti | Paula Estrella
Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)

In this study, we compare the output quality of two MT systems, a statistical (SMT) and a neural (NMT) engine, customised for Swiss Post’s Language Service using the same training data. We focus on the point of view of professional translators and investigate how they perceive the differences between the MT output and a human reference (namely deletions, substitutions, insertions and word order). Our findings show that translators more frequently consider these differences to be errors in SMT than NMT, and that deletions are the most serious errors in both architectures. We also observe lower agreement on differences to be corrected in NMT than in SMT, suggesting that errors are easier to identify in SMT. These findings confirm the ability of NMT to produce correct paraphrases, which could also explain why BLEU is often considered as an inadequate metric to evaluate the performance of NMT systems.

2018

pdf bib abs
Integrating MT at Swiss Post’s Language Service: preliminary results
Pierrette Bouillon | Sabrina Girletti | Paula Estrella | Jonathan Mutal | Martina Bellodi | Beatrice Bircher
Proceedings of the 21st Annual Conference of the European Association for Machine Translation

This paper presents the preliminary results of an ongoing academia-industry collaboration that aims to integrate MT into the workflow of Swiss Post’s Language Service. We describe the evaluations carried out to select an MT tool (commercial or open-source) and assess the suitability of machine translation for post-editing in Swiss Post’s various subject areas and language pairs. The goal of this first phase is to provide recommendations with regard to the tool, language pair and most suitable domain for implementing MT.

pdf bib abs
Developing a New Swiss Research Centre for Barrier-Free Communication
Pierrette Bouillon | Silvia Rodríguez Vázquez | Irene Strasly
Proceedings of the 21st Annual Conference of the European Association for Machine Translation

The project ‘Proposal and Implementation of a Swiss Research Centre for Barrier-free Communication’ (BFC) is a four-year project (2017–2020) funded by the Rectors' Conference of Swiss Higher Education Institutions (swissuniversities).1 Its purpose is to ensure that individuals with a visual or hearing disability, people with a temporary cognitive impairment and speakers without sufficient knowledge of local languages can communicate and enjoy barrier-free access to information in all spheres of life, with a special focus on higher education.

We describe CALL-SLT, a speech-enabled Computer-Assisted Language Learning application where the central idea is to prompt the student with an abstract representation of what they are supposed to say, and then use a combination of grammar-based speech recognition and rule-based translation to rate their response. The system has been developed to the level of a mature prototype, freely deployed on the web, with versions for several languages. We present an overview of the core system architecture and the various types of content we have developed. Finally, we describe several evaluations, the last of which is a study carried out over about a week using 130 subjects recruited through the Amazon Mechanical Turk, in which CALL-SLT was contrasted against a control version where the speech recognition component was disabled. The improvement in student learning performance between the two groups was significant at p < 0.02.

pdf bib
Rule-based automatic post-processing of SMT output to reduce human post-editing effort
Victoria Porro | Johanna Gerlach | Pierrette Bouillon | Violeta Seretan
Proceedings of Translating and the Computer 36

pdf bib
A tool for building multilingual voice questionnaires
Alejandro Armando | Pierrette Bouillon | Manny Rayner | Nikos Tsourakis
Proceedings of Translating and the Computer 36

pdf bib abs
Applying Accessibility-Oriented Controlled Language (CL) Rules to Improve Appropriateness of Text Alternatives for Images: an Exploratory Study
Silvia Rodríguez Vázquez | Pierrette Bouillon | Anton Bolfing
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

At present, inappropriate text alternatives for images in the Web continue to pose web accessibility barriers for people with special needs. Although research efforts have been devoted to define how to write text equivalents for visual content in websites, existing guidelines often lack direct linguistic-oriented recommendations. Similarly, most web accessibility evaluation tools just provide users with an automated functionality to check the presence of text alternatives within the element, rather than a platform to verify their content. This paper presents an overview of the findings from an exploratory study carried out to investigate if the appropriateness level of text alternatives for images in French can be improved when applying controlled language (CL) rules. Results gathered suggest that using accessibility-oriented alt style rules can have a significant impact on text alternatives appropriateness. Although more data would be needed to draw further conclusions about our proposal, this preliminary study already offers an interest insight into the potential use of CL checkers such as Acrolinx for language-based web accessibility evaluation.

pdf bib abs
A Large-Scale Evaluation of Pre-editing Strategies for Improving User-Generated Content Translation
Violeta Seretan | Pierrette Bouillon | Johanna Gerlach
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The user-generated content represents an increasing share of the information available today. To make this type of content instantly accessible in another language, the ACCEPT project focuses on developing pre-editing technologies for correcting the source text in order to increase its translatability. Linguistically-informed pre-editing rules have been developed for English and French for the two domains considered by the project, namely, the technical domain and the healthcare domain. In this paper, we present the evaluation experiments carried out to assess the impact of the proposed pre-editing rules on translation quality. Results from a large-scale evaluation campaign show that pre-editing helps indeed attain a better translation quality for a high proportion of the data, the difference with the number of cases where the adverse effect is observed being statistically significant. The ACCEPT pre-editing technology is freely available online and can be used in any Web-based environment to enhance the translatability of user-generated content so that it reaches a broader audience.

pdf bib
The ACCEPT Portal: An Online Framework for the Pre-editing and Post-editing of User-Generated Content
Violeta Seretan | Johann Roturier | David Silva | Pierrette Bouillon
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation

2013

pdf bib
Comparing Forum Data Post-Editing Performance Using Translation Memory and Machine Translation Output: A Pilot Study
Lucia Morado Vazquez | Silvia Rodriguez Vazquez | Pierrette Bouillon
Proceedings of Machine Translation Summit XIV: Posters

pdf bib
Automated Community Content Editing PorTal (ACCEPT)
Pierrette Bouillon
Proceedings of Machine Translation Summit XIV: European projects

pdf bib
Combining pre-editing and post-editing to improve SMT of user-generated content
Johanna Gerlach | Victoria Porro | Pierrette Bouillon | Sabine Lehmann
Proceedings of the 2nd Workshop on Post-editing Technology and Practice

pdf bib
Can lightweight pre-editing rules improve statistical MT of forum content? (La La préédition avec des règles peu coûteuses, utile pour la TA statistique des forums ?) [in French]
Johanna Gerlach | Victoria Porro | Pierrette Bouillon | Sabine Lehmann
Proceedings of TALN 2013 (Volume 2: Short Papers)

pdf bib
Two Approaches to Correcting Homophone Confusions in a Hybrid Machine Translation System
Pierrette Bouillon | Johanna Gerlach | Ulrich Germann | Barry Haddow | Manny Rayner
Proceedings of the Second Workshop on Hybrid Approaches to Translation

2012

pdf bib abs
Using Source-Language Transformations to Address Register Mismatches in SMT
Manny Rayner | Pierrette Bouillon | Barry Haddow
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

Mismatches between training and test data are a ubiquitous problem for real SMT applications. In this paper, we examine a type of mismatch that commonly arises when translating from French and similar languages: available training data is mostly formal register, but test data may well be informal register. We consider methods for defining surface transformations that map common informal language constructions into their formal language counterparts, or vice versa; we then describe two ways to use these mappings, either to create artificial training data or to pre-process source text at run-time. An initial evaluation performed using crowd-sourced comparisons of alternate translations produced by a French-to-English SMT system suggests that both methods can improve performance, with run-time pre-processing being the more effective of the two.

pdf bib abs
Evaluating Appropriateness Of System Responses In A Spoken CALL Game
Manny Rayner | Pierrette Bouillon | Johanna Gerlach
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe an experiment carried out using a French version of CALL-SLT, a web-enabled CALL game in which students at each turn are prompted to give a semi-free spoken response which the system then either accepts or rejects. The central question we investigate is whether the response is appropriate; we do this by extracting pairs of utterances where both members of the pair are responses by the same student to the same prompt, and where one response is accepted and one rejected. When the two spoken responses are presented in random order, native speakers show a reasonable degree of agreement in judging that the accepted utterance is better than the rejected one. We discuss the significance of the results and also present a small study supporting the claim that native speakers are nearly always recognised by the system, while non-native speakers are rejected a significant proportion of the time.

pdf bib abs
Annotating Qualia Relations in Italian and French Complex Nominals
Pierrette Bouillon | Elisabetta Jezek | Chiara Melloni | Aurélie Picton
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The goal of this paper is to provide an annotation scheme for compounds based on generative lexicon theory (GL, Pustejovsky, 1995; Bassac and Bouillon, 2001). This scheme has been tested on a set of compounds automatically extracted from the Europarl corpus (Koehn, 2005) both in Italian and French. The motivation is twofold. On the one hand, it should help refine existing compound classifications and better explain lexicalization in both languages. On the other hand, we hope that the extracted generalizations can be used in NLP, for example for improving MT systems or for query reformulation (Claveau, 2003). In this paper, we focus on the annotation scheme and its on going evaluation.

2011

pdf bib
Bootstrapping a statistical speech translator from a rule-based one
Manny Rayner | Paula Estrella | Pierrette Bouillon
Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation

pdf bib
Pour une interlangue utile en traduction automatique de la parole dans des domaines limités [Towards an interlingua for speech translation in limited domains]
Pierrette Bouillon | Manny Rayner | Paula Estella | Johanna Gerlach | Maria Georgescul
Traitement Automatique des Langues, Volume 52, Numéro 1 : Varia [Varia]

2010

pdf bib
A Bootstrapped Interlingua-Based SMT Architecture
Manny Rayner | Paula Estrella | Pierrette Bouillon
Proceedings of the 14th Annual Conference of the European Association for Machine Translation

pdf bib abs
Examining the Effects of Rephrasing User Input on Two Mobile Spoken Language Systems
Nikos Tsourakis | Agnes Lisowska | Manny Rayner | Pierrette Bouillon
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

During the construction of a spoken dialogue system much effort is spent on improving the quality of speech recognition as possible. However, even if an application perfectly recognizes the input, its understanding may be far from what the user originally meant. The user should be informed about what the system actually understood so that an error will not have a negative impact in the later stages of the dialogue. One important aspect that this work tries to address is the effect of presenting the systems understanding during interaction with users. We argue that for specific kinds of applications its important to confirm the understanding of the system before obtaining the output. In this way the user can avoid misconceptions and problems occurring in the dialogue flow and he can enhance his confidence in the system. Nevertheless this has an impact on the interaction, as the mental workload increases, and the users behavior may adapt to the systems coverage. We focus on two applications that implement the notion of rephrasing users input in a different way. Our study took place among 14 subjects that used both systems on a Nokia N810 Internet Tablet.

We describe a multilingual Open Source CALL game, CALL-SLT, which reuses speech translation technology developed using the Regulus platform to create an automatic conversation partner that allows intermediate-level language students to improve their fluency. We contrast CALL-SLT with Wang's and Seneff's ``translation game'' system, in particular focussing on three issues. First, we argue that the grammar-based recognition architecture offered by Regulus is more suitable for this type of application; second, that it is preferable to prompt the student in a language-neutral form, rather than in the L1; and third, that we can profitably record successful interactions by native speakers and store them to be reused as online help for students. The current system, which will be demoed at the conference, supports four L2s (English, French, Japanese and Swedish) and two L1s (English and French). We conclude by describing an evaluation exercise, where a version of CALL-SLT configured for English L2 and French L1 was used by several hundred high school students. About half of the subjects reported positive impressions of the system.

2009

pdf bib
Technology in Translator Training and tools for translators
Pierrette Bouillon | Marianne Starlander
Proceedings of Machine Translation Summit XII: Plenaries

pdf bib
Using Artificial Data to Compare the Difficulty of Using Statistical Machine Translation in Different Language-Pairs
Manny Rayner | Paula Estrella | Pierrette Bouillon | Yukie Nakao
Proceedings of Machine Translation Summit XII: Posters

pdf bib
Using Artificially Generated Data to Evaluate Statistical Machine Translation
Manny Rayner | Paula Estrella | Pierrette Bouillon | Beth Ann Hockey | Yukie Nakao
Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks (GEAF 2009)

2008

pdf bib
Comparing two different bidirectional versions of the limited-domain medical spoken language translator MedSLT
Marianne Starlander | Pierrette Bouillon | Glenn Flores | Manny Rayner | Nikos Tsourakis
Proceedings of the 12th Annual Conference of the European Association for Machine Translation

pdf bib
Almost Flat Functional Semantics for Speech Translation
Manny Rayner | Pierrette Bouillon | Beth Ann Hockey | Yukie Nakao
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

We describe recent work on MedSLT, a medium-vocabulary interlingua-based medical speech translation system, focussing on issues that arise when handling languages of which the grammar engineer has little or no knowledge. We show how we can systematically create and maintain multiple forms of grammars, lexica and interlingual representations, with some versions being used by language informants, and some by grammar engineers. In particular, we describe the advantages of structuring the interlingua definition as a simple semantic grammar, which includes a human-readable surface form. We show how this allows us to rationalise the process of evaluating translations between languages lacking common speakers, and also makes it possible to create a simple generic tool for debugging to-interlingua translation rules. Examples presented focus on the concrete case of translation between Japanese and Arabic in both directions.

pdf bib abs
Building Mobile Spoken Dialogue Applications Using Regulus
Nikos Tsourakis | Maria Georgescul | Pierrette Bouillon | Manny Rayner
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Regulus is an Open Source platform that supports construction of rule-based medium-vocabulary spoken dialogue applications. It has already been used to build several substantial speech-enabled applications, including NASAs Clarissa procedure navigator and Geneva Universitys MedSLT medical speech translator. System like these would be far more useful if they were available on a hand-held device, rather than, as with the present version, on a laptop. In this paper we describe the Open Source framework we have developed, which makes it possible to run Regulus applications on generally available mobile devices, using a distributed client-server architecture that offers transparent and reliable integration with different types of ASR systems. We describe the architecture, an implemented calendar application prototype hosted on a mobile device, and an evaluation. The evaluation shows that performance on the mobile device is as good as performance on a normal desktop PC.

pdf bib
Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications
Pierrette Bouillon | Farzad Ehsani | Robert Frederking | Michael McTear | Manny Rayner
Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications

pdf bib
Making Speech Look Like Text in the Regulus Development Environment
Elisabeth Kron | Manny Rayner | Marianne Santaholma | Pierrette Bouillon | Agnes Lisowska
Coling 2008: Proceedings of the workshop on Grammar Engineering Across Frameworks

2007

pdf bib abs
Les ellipses dans un système de traduction automatique de la parole
Pierrette Bouillon | Manny Rayner | Marianne Starlander | Marianne Santaholma
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Dans tout dialogue, les phrases elliptiques sont très nombreuses. Dans cet article, nous évaluons leur impact sur la reconnaissance et la traduction dans le système de traduction automatique de la parole MedSLT. La résolution des ellipses y est effectuée par une méthode robuste et portable, empruntée aux systèmes de dialogue homme-machine. Cette dernière exploite une représentation sémantique plate et combine des techniques linguistiques (pour construire la représentation) et basées sur les exemples (pour apprendre sur la base d’un corpus ce qu’est une ellipse bien formée dans un sous-domaine donné et comment la résoudre).

pdf bib abs
Un Lexique Génératif de référence pour le français
Fiammetta Namer | Pierrette Bouillon | Évelyne Jacquey
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Cet article propose une approche originale visant la construction d’un lexique sémantique de référence sur le français. Sa principale caractéristique est de pouvoir s’appuyer sur les propriétés morphologiques des lexèmes. La méthode combine en effet des résultats d’analyse morphologique (Namer, 2002;2003), à partir de ressources lexicales de grande taille (nomenclatures du TLF) et des méthodologies d’acquisition d’information lexicale déjà éprouvées (Namer 2005; Sébillot 2002). Le format de représentation choisi, dans le cadre du Lexique Génératif, se distingue par ses propriétés d’expressivité et d’économie. Cette approche permet donc d’envisager la construction d’un lexique de référence sur le français caractérisé par une forte homogénéité tout en garantissant une couverture large, tant du point de vue de la nomenclature que du point de vue des contenus sémantiques. Une première validation de la méthode fournit une projection quantitative et qualitative des résultats attendus.

pdf bib
Adapting a Medical speech to speech translation system (MedSLT) to Arabic
Pierrette Bouillon | Sonia Halimi | Manny Rayner | Beth Ann Hockey
Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources

pdf bib
Proceedings of the Workshop on Grammar-Based Approaches to Spoken Language Processing
Pierrette Bouillon | Manny Rayner
Proceedings of the Workshop on Grammar-Based Approaches to Spoken Language Processing

pdf bib
A Development Environment for Building Grammar-Based Speech-Enabled Applications
Elisabeth Kron | Manny Rayner | Marianne Santaholma | Pierrette Bouillon
Proceedings of the Workshop on Grammar-Based Approaches to Spoken Language Processing

2006

pdf bib abs
Une grammaire multilingue partagée pour la traduction automatique de la parole
Pierrette Bouillon | Manny Rayner | Bruna Novellas | Yukie Nakao | Marianne Santaholma | Marianne Starlander | Nikos Chatzichrisafis
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Aujourd’hui, l’approche la plus courante en traitement de la parole consiste à combiner un reconnaisseur statistique avec un analyseur robuste. Pour beaucoup d’applications cependant, les reconnaisseurs linguistiques basés sur les grammaires offrent de nombreux avantages. Dans cet article, nous présentons une méthodologie et un ensemble de logiciels libres (appelé Regulus) pour dériver rapidement des reconnaisseurs linguistiquement motivés à partir d’une grammaire générale partagée pour le catalan et le français.

pdf bib
Une grammaire partagée multitâche pour le traitement de la parole : application aux langues romanes [A multitask shared grammar for speech processing: application to romance languages]
Pierrette Bouillon | Manny Rayner | Bruna Novellas | Marianne Starlander | Marianne Santaholma | Yukie Nakao | Nikos Chatzichrisafis
Traitement Automatique des Langues, Volume 47, Numéro 3 : Varia [Varia]

pdf bib abs
REGULUS: A Generic Multilingual Open Source Platform for Grammar-Based Speech Applications
Manny Rayner | Pierrette Bouillon | Beth Ann Hockey | Nikos Chatzichrisafis
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We present an overview of Regulus, an Open Source platform that supports corpus-based derivation of efficient domain-specific speech recognisers from general linguistically motivated unification grammars. We list available Open Source resources, which include compilers, resource grammars for various languages, documentation and a development environment. The greater part of the paper presents a series of experiments carried out using a medium-vocabulary medical speech translation application and a corpus of 801 recorded domain utterances, designed to investigate the impact on speech understanding performance of vocabulary size, grammatical coverage, presence or absence of various linguistic features, degree of generality of thegrammar and use or otherwise of probabilistic weighting in the CFGlanguage model. In terms of task accuracy, the most significant factors were the use of probabilistic weighting, the degree of generality of the grammar and the inclusion of features which model sortal restrictions.

pdf bib
Proceedings of the First International Workshop on Medical Speech Translation
Pierrette Bouillon | Farzad Ehsani | Robert Frederking | Manny Rayner
Proceedings of the First International Workshop on Medical Speech Translation

pdf bib
Evaluating Task Performance for a Unidirectional Controlled Language Medical Speech Translation System
Nikos Chatzichrisafis | Pierrette Bouillon | Manny Rayner | Marianne Santaholma | Marianne Starlander | Beth Ann Hockey
Proceedings of the First International Workshop on Medical Speech Translation

2005

pdf bib abs
Representational and architectural issues in a limited-domain medical speech translator
Manny Rayner | Pierrette Bouillon | Marianne Santaholma | Yukie Nakao
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

We present an overview of MedSLT, a medium-vocabulary medical speech translation system, focussing on the representational issues that arise when translating temporal and causal concepts. Although flat key/value structures are strongly preferred as semantic representations in speech understanding systems, we argue that it is infeasible to handle the necessary range of concepts using only flat structures. By exploiting the specific nature of the task, we show that it is possible to implement a solution which only slightly extends the representational complexity of the semantic representation language, by permitting an optional single nested level representing a subordinate clause construct. We sketch our solutions to the key problems of producing minimally nested representations using phrase-spotting methods, and writing cleanly structured rule-sets that map temporal and phrasal representations into a canonical interlingual form.

In this paper, we present evidence that providing users of a speech to speech translation system for emergency diagnosis (MedSLT) with a tool that helps them to learn the coverage greatly improves their success in using the system. In MedSLT, the system uses a grammar-based recogniser that provides more predictable results to the translation component. The help module aims at addressing the lack of robustness inherent in this type of approach. It takes as input the result of a robust statistical recogniser that performs better for out-of-coverage data and produces a list of in-coverage example sentences. These examples are selected from a defined list using a heuristic that prioritises sentences maximising the number of N-grams shared with those extracted from the recognition result.

2004

pdf bib
Comparing rule-based and statistical approaches to speech understanding in a limited domain speech translation system
Manny Rayner | Pierrette Bouillon | Beth Ann Hockey | Nikos Chatzichrisafis | Marianne Starlander
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

pdf bib abs
Automatisation of the Activity of Term Collection in Different Languages
Bruno Cartoni | Pierrette Bouillon | Yalina Alphonse | Sabine Lehmann
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

This article describes the use and development of a tool for grammar and terminology control (FLAG), for the purposes of automating the verification of terminology for a large-scale user of multilingual terminology. It describes the various advantages of the tool and shows a process for transforming a traditional terminology list into a list of inflected forms as well as patterns which can be used to find possible morpho-syntactic derivations of terms.

pdf bib
Methodology For Building Thematic Indexes In Medicine For French
Yalina Alphonse | Pierrette Bouillon
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Semi-Automatic Derivation of a French Lexicon from CLIPS
Nilda Ruimy | Pierrette Bouillon | Bruno Cartoni
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
A Limited-Domain English to Japanese Medical Speech Translator Built Using REGULUS 2
Manny Rayner | Pierrette Bouillon | Vol Van Dalsem III | Hitoshi Isahara | Kyoko Kanzaki | Beth Ann Hockey
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf bib
Acquisition of Qualia Elements from Corpora - Evaluation of a Symbolic Learning Method
Pierrette Bouillon | Vincent Claveau | Cécile Fabre | Pascale Sébillot
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
A Flexible Speech to Speech Phrasebook Translator
Manny Rayner | Pierrette Bouillon
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

2000

pdf bib
Minimal Commitment and Full Lexical Disambiguation: Balancing Rules and Hidden Markov Models
Patrick Ruch | Robert Baud | Pierrette Bouillon | Gilbert Robert
Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop

pdf bib
Inductive Logic Programming for Corpus-Based Acquisition of Semantic Lexicons
Pascale Sébillot | Pierrette Bouillon | Cecile Fabre
Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop

1996

pdf bib
Mental State Adjectives: the Perspective of Generative Lexicon
Pierrette Bouillon
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

1995

pdf bib
Using corpora to develop limited-domain speech translation systems
Manny Rayner | Pierrette Bouillon | David Carter
Proceedings of Translating and the Computer 17

1994

pdf bib
On the Proper Role of Coercion in Semantic Typing
James Pustejovsky | Pierrette Bouillon
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

pdf bib
Semantic Lexicons: The Cornerstone for Lexical Choice in Natural Language Generation
Evelyne Viegas | Pierrette Bouillon
Proceedings of the Seventh International Workshop on Natural Language Generation

1992

pdf bib
Compound Nouns in a Unification-Based MT System
Pierrette Bouillon | Katharina Boesefeldt | Graham Russell
Third Conference on Applied Natural Language Processing

pdf bib
Une representation semantique et un systeme de transfert pour une traduction de haute qualite. Presentation de projet Avec demonstration sur machines SUN
K. Boesefeldt | P. Bouillon
COLING 1992 Volume 3: The 14th International Conference on Computational Linguistics

1991

pdf bib abs
Applying an Experimental MT System to a Realistic Problem
Pierrette Bouillon | Katharina Boeseleldt
Proceedings of Machine Translation Summit III: Papers

This presentation outlines the implementation of a machine translation system for avalanche warning bulletins in natural language, using a unification-based formalism developed at ISSCO, which will be introduced at the same occasion. Concrete examples taken from this project exemplify a modern approach to ma- chine translation: a rich representation of the semantic content of a sentence, the use of a sin- gle grammar for parsing and generating as well as generation and transfer based exclusively on the semantic representation of a sentence. Simultaneously, the limits of bidirectional trans- fer are being tested.