Joachim Van Den Bogaert

Also published as: Joachim Van den Bogaert, Joachim Van den Bogaert, Joachim van den Bogaert


2022

pdf bib
ELRC Action: Covering Confidentiality, Correctness and Cross-linguality
Tom Vanallemeersch | Arne Defauw | Sara Szoc | Alina Kramchaninova | Joachim Van den Bogaert | Andrea Lösch
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We describe the language technology (LT) assessments carried out in the ELRC action (European Language Resource Coordination) of the European Commission, which aims towards minimising language barriers across the EU. We zoom in on the two most extensive assessments. These LT specifications do not only involve experiments with tools and techniques but also an extensive consultation round with stakeholders from public organisations, academia and industry, in order to gather insights into scenarios and best practices. The LT specifications concern (1) the field of automated anonymisation, which is motivated by the need of public and other organisations to be able to store and share data, and (2) the field of multilingual fake news processing, which is motivated by the increasingly pressing problem of disinformation and the limited language coverage of systems for automatically detecting misleading articles. For each specification, we set up a corresponding proof-of-concept software to demonstrate the opportunities and challenges involved in the field.

pdf bib
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
Helena Moniz | Lieve Macken | Andrew Rufener | Loïc Barrault | Marta R. Costa-jussà | Christophe Declercq | Maarit Koponen | Ellie Kemp | Spyridon Pilos | Mikel L. Forcada | Carolina Scarton | Joachim Van den Bogaert | Joke Daems | Arda Tezcan | Bram Vanroy | Margot Fonteyne
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

pdf bib
Automatically extracting the semantic network out of public services to support cities becoming Smart Cities
Joachim Van den Bogaert | Laurens Meeus | Alina Kramchaninova | Arne Defauw | Sara Szoc | Frederic Everaert | Koen Van Winckel | Anna Bardadym | Tom Vanallemeersch
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

The CEFAT4Cities project aims at creating a multilingual semantic interoperability layer for Smart Cities that allows users from all EU member States to interact with public services in their own language. The CEFAT4Cities processing pipeline transforms natural-language administrative procedures into machine-readable data using various multilingual Natural Language Processing techniques, such as semantic networks and machine translation, thus allowing for the development of more sophisticated and more user-friendly public services applications.

2020

pdf bib
Being Generous with Sub-Words towards Small NMT Children
Arne Defauw | Tom Vanallemeersch | Koen Van Winckel | Sara Szoc | Joachim Van den Bogaert
Proceedings of the Twelfth Language Resources and Evaluation Conference

In the context of under-resourced neural machine translation (NMT), transfer learning from an NMT model trained on a high resource language pair, or from a multilingual NMT (M-NMT) model, has been shown to boost performance to a large extent. In this paper, we focus on so-called cold start transfer learning from an M-NMT model, which means that the parent model is not trained on any of the child data. Such a set-up enables quick adaptation of M-NMT models to new languages. We investigate the effectiveness of cold start transfer learning from a many-to-many M-NMT model to an under-resourced child. We show that sufficiently large sub-word vocabularies should be used for transfer learning to be effective in such a scenario. When adopting relatively large sub-word vocabularies we observe increases in performance thanks to transfer learning from a parent M-NMT model, both when translating to and from the under-resourced language. Our proposed approach involving dynamic vocabularies is both practical and effective. We report results on two under-resourced language pairs, i.e. Icelandic-English and Irish-English.

pdf bib
A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality?
Julia Ive | Lucia Specia | Sara Szoc | Tom Vanallemeersch | Joachim Van den Bogaert | Eduardo Farah | Christine Maroti | Artur Ventura | Maxim Khalilov
Proceedings of the Twelfth Language Resources and Evaluation Conference

We introduce a machine translation dataset for three pairs of languages in the legal domain with post-edited high-quality neural machine translation and independent human references. The data was collected as part of the EU APE-QUEST project and comprises crawled content from EU websites with translation from English into three European languages: Dutch, French and Portuguese. Altogether, the data consists of around 31K tuples including a source sentence, the respective machine translation by a neural machine translation system, a post-edited version of such translation by a professional translator, and - where available - the original reference translation crawled from parallel language websites. We describe the data collection process, provide an analysis of the resulting post-edits and benchmark the data using state-of-the-art quality estimation and automatic post-editing models. One interesting by-product of our post-editing analysis suggests that neural systems built with publicly available general domain data can provide high-quality translations, even though comparison to human references suggests that this quality is quite low. This makes our dataset a suitable candidate to test evaluation metrics. The data is freely available as an ELRC-SHARE resource.

pdf bib
APE-QUEST: an MT Quality Gate
Heidi Depraetere | Joachim Van den Bogaert | Sara Szoc | Tom Vanallemeersch
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

The APE-QUEST project (2018–2020) sets up a quality gate and crowdsourcing workflow for the eTranslation system of EC’s Connecting Europe Facility to improve translation quality in specific domains. It packages these services as a translation portal for machine-to-machine and machine-to-human scenarios.

pdf bib
MICE: a middleware layer for MT
Joachim Van den Bogaert | Tom Vanallemeersch | Heidi Depraetere
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

The MICE project (2018-2020) will deliver a middleware layer for improving the output quality of the eTranslation system of EC’s Connecting Europe Facility through additional services, such as domain adaptation and named entity recognition. It will also deliver a user portal, allowing for human post-editing.

pdf bib
OCR, Classification& Machine Translation (OCCAM)
Joachim Van den Bogaert | Arne Defauw | Frederic Everaert | Koen Van Winckel | Alina Kramchaninova | Anna Bardadym | Tom Vanallemeersch | Pavel Smrž | Michal Hradiš
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

The OCCAM project (Optical Character recognition, ClassificAtion & Machine Translation) aims at integrating the CEF (Connecting Europe Facility) Automated Translation service with image classification, Translation Memories (TMs), Optical Character Recognition (OCR), and Machine Translation (MT). It will support the automated translation of scanned business documents (a document format that, currently, cannot be processed by the CEF eTranslation service) and will also lead to a tool useful for the Digital Humanities domain.

pdf bib
CEFAT4Cities, a Natural Language Layer for the ISA2 Core Public Service Vocabulary
Joachim Van den Bogaert | Arne Defauw | Sara Szoc | Frederic Everaert | Koen Van Winckel | Alina Kramchaninova | Anna Bardadym | Tom Vanallemeersch
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

The CEFAT4Cities project (2020-2022) will create a “Smart Cities natural language context” (a software layer that facilitates the conversion of natural-language administrative procedures, into machine-readable data sets) on top of the existing ISA2 interoperability layer for public services. Integration with the FIWARE/ORION “Smart City” Context Broker, will make existing, paper-based, public services discoverable through “Smart City” frameworks, thus allowing for the development of more sophisticated and more user-friendly public services applications. An automated translation component will be included, to provide a solution that can be used by all EU Member States. As a result, the project will allow EU citizens and businesses to interact with public services on the city, national, regional and EU level, in their own language.

2019

pdf bib
APE-QUEST
Joachim Van den Bogaert | Heidi Depraetere | Sara Szoc | Tom Vanallemeersch | Koen Van Winckel | Frederic Everaert | Lucia Specia | Julia Ive | Maxim Khalilov | Christine Maroti | Eduardo Farah | Artur Ventura
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

pdf bib
MICE
Joachim Van den Bogaert | Heidi Depraetere | Tom Vanallemeersch | Frederic Everaert | Koen Van Winckel | Katri Tammsaar | Ingmar Vali | Tambet Artma | Piret Saartee | Laura Katariina Teder | Artūrs Vasiļevskis | Valters Sics | Johan Haelterman | David Bienfait
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

pdf bib
Collecting domain specific data for MT: an evaluation of the ParaCrawlpipeline
Arne Defauw | Tom Vanallemeersch | Sara Szoc | Frederic Everaert | Koen Van Winckel | Kim Scholte | Joris Brabers | Joachim Van den Bogaert
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

pdf bib
Developing a Neural Machine Translation system for Irish
Arne Defauw | Sara Szoc | Tom Vanallemeersch | Anna Bardadym | Joris Brabers | Frederic Everaert | Kim Scholte | Koen Van Winckel | Joachim Van den Bogaert
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages

2018

pdf bib
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Juan Antonio Pérez-Ortiz | Felipe Sánchez-Martínez | Miquel Esplà-Gomis | Maja Popović | Celia Rico | André Martins | Joachim Van den Bogaert | Mikel L. Forcada
Proceedings of the 21st Annual Conference of the European Association for Machine Translation

2014

pdf bib
Moses SMT as an aid to translators in the production process
Falko Schaefer | Joeri van de Walle | Joachim van den Bogaert
Proceedings of the 17th Annual Conference of the European Association for Machine Translation

2013

pdf bib
Productivity or Quality? Let’s do both!
Joachim Van den Bogaert | Nathalie De Sutter
Proceedings of Machine Translation Summit XIV: User track

pdf bib
Bologna Translation Service (BOLOGNA)
Joachim Van den Bogaert | Heidi Depraetere | Joeri Van de Walle
Proceedings of Machine Translation Summit XIV: European projects

2011

pdf bib
Bologna Translation Service: Online translation of course syllabi and study programmes in English
Heidi Depraetere | Joachim Van Den Bogaert | Joeri Van De Walle
Proceedings of the 15th Annual Conference of the European Association for Machine Translation