Peggy van der Kreeft


pdf bib
plain X - AI Supported Multilingual Video Workflow Platform
Carlos Amaral | Peggy van der Kreeft
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

The plain X platform is a toolbox for multilingual adaptation, for video, audio, and text content. The software is a 4-in-1 tool, combining several steps in the adaptation process, i.e., transcription, translation, subtitling, and voice-over, all automatically generated, but with a high level of editorial control. Users can choose which translation engine is used (e.g., MS Azure, Google, DeepL) depending on best performance. As a result, plain X enables a smooth semi-automated production of subtitles or voice-over, much faster than with older, manual workflows. The software was developed out of EU research projects and has recently been rolled out for professional use. It brings Artificial Intelligence (AI) into the multilingual media production process, while keeping the human in the loop.

pdf bib
GoURMET – Machine Translation for Low-Resourced Languages
Peggy van der Kreeft | Alexandra Birch | Sevi Sariisik | Felipe Sánchez-Martínez | Wilker Aziz
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

The GoURMET project, funded by the European Commission’s H2020 program (under grant agreement 825299), develops models for machine translation, in particular for low-resourced languages. Data, models and software releases as well as the GoURMET Translate Tool are made available as open source.


pdf bib
Surprise Language Challenge: Developing a Neural Machine Translation System between Pashto and English in Two Months
Alexandra Birch | Barry Haddow | Antonio Valerio Miceli Barone | Jindrich Helcl | Jonas Waldendorf | Felipe Sánchez Martínez | Mikel Forcada | Víctor Sánchez Cartagena | Juan Antonio Pérez-Ortiz | Miquel Esplà-Gomis | Wilker Aziz | Lina Murady | Sevi Sariisik | Peggy van der Kreeft | Kay Macquarrie
Proceedings of Machine Translation Summit XVIII: Research Track

In the media industry and the focus of global reporting can shift overnight. There is a compelling need to be able to develop new machine translation systems in a short period of time and in order to more efficiently cover quickly developing stories. As part of the EU project GoURMET and which focusses on low-resource machine translation and our media partners selected a surprise language for which a machine translation system had to be built and evaluated in two months(February and March 2021). The language selected was Pashto and an Indo-Iranian language spoken in Afghanistan and Pakistan and India. In this period we completed the full pipeline of development of a neural machine translation system: data crawling and cleaning and aligning and creating test sets and developing and testing models and and delivering them to the user partners. In this paperwe describe rapid data creation and experiments with transfer learning and pretraining for this low-resource language pair. We find that starting from an existing large model pre-trained on 50languages leads to far better BLEU scores than pretraining on one high-resource language pair with a smaller model. We also present human evaluation of our systems and which indicates that the resulting systems perform better than a freely available commercial system when translating from English into Pashto direction and and similarly when translating from Pashto into English.


pdf bib
Global Under-Resourced Media Translation (GoURMET)
Alexandra Birch | Barry Haddow | Ivan Tito | Antonio Valerio Miceli Barone | Rachel Bawden | Felipe Sánchez-Martínez | Mikel L. Forcada | Miquel Esplà-Gomis | Víctor Sánchez-Cartagena | Juan Antonio Pérez-Ortiz | Wilker Aziz | Andrew Secker | Peggy van der Kreeft
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks


pdf bib
The SUMMA Platform: Scalable Understanding of Multilingual Media
Ulrich Germann | Peggy van der Kreeft | Guntis Barzdins | Alexandra Birch
Proceedings of the 21st Annual Conference of the European Association for Machine Translation

We present the latest version of the SUMMA platform, an open-source software platform for monitoring and interpreting multi-lingual media, from written news published on the internet to live media broadcasts via satellite or internet streaming.

pdf bib
news.bridge - Automated Transcription and Translation for News
Peggy van der Kreeft | Renars Liepins
Proceedings of the 21st Annual Conference of the European Association for Machine Translation

news.bridge provides a platform for multilingual video processing, including automated transcription and translation, subtitling, voice-over, and summarization, with post-editing facility of videos in a broad range of languages. The platform is currently in beta testing at Deutsche Welle for republishing of videos in other languages.


pdf bib
The SUMMA Platform Prototype
Renars Liepins | Ulrich Germann | Guntis Barzdins | Alexandra Birch | Steve Renals | Susanne Weber | Peggy van der Kreeft | Hervé Bourlard | João Prieto | Ondřej Klejch | Peter Bell | Alexandros Lazaridis | Alfonso Mendes | Sebastian Riedel | Mariana S. C. Almeida | Pedro Balage | Shay B. Cohen | Tomasz Dwojak | Philip N. Garner | Andreas Giefer | Marcin Junczys-Dowmunt | Hina Imran | David Nogueira | Ahmed Ali | Sebastião Miranda | Andrei Popescu-Belis | Lesly Miculicich Werlen | Nikos Papasarantopoulos | Abiola Obamuyide | Clive Jones | Fahim Dalvi | Andreas Vlachos | Yang Wang | Sibo Tong | Rico Sennrich | Nikolaos Pappas | Shashi Narayan | Marco Damonte | Nadir Durrani | Sameer Khurana | Ahmed Abdelali | Hassan Sajjad | Stephan Vogel | David Sheppey | Chris Hernon | Jeff Mitchell
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.