2023
pdf
bib
Proceedings of the Second International Workshop on Automatic Translation for Signed and Spoken Languages
Dimitar Shterionov
|
Mirella De Sisto
|
Mathias Muller
|
Davy Van Landuyt
|
Rehana Omardeen
|
Shaun Oboyle
|
Annelies Braffort
|
Floris Roelofsen
|
Fred Blain
|
Bram Vanroy
|
Eleftherios Avramidis
Proceedings of the Second International Workshop on Automatic Translation for Signed and Spoken Languages
pdf
bib
abs
A New English-Dutch-NGT Corpus for the Hospitality Domain
Mirella De Sisto
|
Vincent Vandeghinste
|
Dimitar Shterionov
Proceedings of the Second International Workshop on Automatic Translation for Signed and Spoken Languages
One of the major challenges hampering the development of language technology which targets sign languages is the extremely limited availability of good quality data geared towards machine learning and deep learning approaches. In this paper we introduce the NGT-Dutch Hotel Review Corpus (NGT-HoReCo), which addresses this issue by providing multimodal parallel data in English, Dutch and Sign Language of the Netherlands (NGT). The corpus contains 283 hotel reviews in written English, translated into written Dutch and into NGT videos. It will be made publicly available through CLARIN and through the ELG platform.
pdf
bib
abs
Tailoring Domain Adaptation for Machine Translation Quality Estimation
Javad Pourmostafa Roshan Sharami
|
Dimitar Shterionov
|
Frédéric Blain
|
Eva Vanmassenhove
|
Mirella De Sisto
|
Chris Emmery
|
Pieter Spronck
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high-cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizabile, i.e., they should be able to handle data from different domains, both generic and specific. To alleviate these two main issues — data scarcity and domain mismatch — this paper combines domain adaptation and data augmentation within a robust QE system. Our method is to first train a generic QE model and then fine-tune it on a specific domain while retaining generic knowledge. Our results show a significant improvement for all the language pairs investigated, better cross-lingual inference, and a superior performance in zero-shot learning scenarios as compared to state-of-the-art baselines.
pdf
bib
abs
SignON: Sign Language Translation. Progress and challenges.
Vincent Vandeghinste
|
Dimitar Shterionov
|
Mirella De Sisto
|
Aoife Brady
|
Mathieu De Coster
|
Lorraine Leeson
|
Josep Blat
|
Frankie Picron
|
Marcello Paolo Scipioni
|
Aditya Parikh
|
Louis ten Bosch
|
John O’Flaherty
|
Joni Dambre
|
Jorn Rijckaert
|
Bram Vanroy
|
Victor Ubieto Nogales
|
Santiago Egea Gomez
|
Ineke Schuurman
|
Gorka Labaka
|
Adrián Núnez-Marcos
|
Irene Murtagh
|
Euan McGill
|
Horacio Saggion
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
SignON (
https://signon-project.eu/) is a Horizon 2020 project, running from 2021 until the end of 2023, which addresses the lack of technology and services for the automatic translation between sign languages (SLs) and spoken languages, through an inclusive, human-centric solution, hence contributing to the repertoire of communication media for deaf, hard of hearing (DHH) and hearing individuals. In this paper, we present an update of the status of the project, describing the approaches developed to address the challenges and peculiarities of SL machine translation (SLMT).
pdf
bib
abs
GoSt-ParC-Sign: Gold Standard Parallel Corpus of Sign and spoken language
Mirella De Sisto
|
Vincent Vandeghinste
|
Lien Soetemans
|
Caro Brosens
|
Dimitar Shterionov
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
Good quality training data for Sign Language Machine Translation (SLMT) is extremely scarce, and this is one of the challenges that any project focusing on Machine Translation (MT) which also targets sign languages is currently facing. The goal of this ongoing project is to create a parallel corpus of authentic Flemish Sign Language (VGT) and written Dutch which can be employed as gold standard in automated sign language translation. The availability of a gold standard corpus like Gost-ParC-Sign can facilitate the advances of SLMT; consequently, it supports and promotes inclusiveness in MT and, on a more general level, in language technology
2022
pdf
bib
abs
Sign Language Translation: Ongoing Development, Challenges and Innovations in the SignON Project
Dimitar Shterionov
|
Mirella De Sisto
|
Vincent Vandeghinste
|
Aoife Brady
|
Mathieu De Coster
|
Lorraine Leeson
|
Josep Blat
|
Frankie Picron
|
Marcello Paolo Scipioni
|
Aditya Parikh
|
Louis ten Bosh
|
John O’Flaherty
|
Joni Dambre
|
Jorn Rijckaert
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
The SignON project (www.signon-project.eu) focuses on the research and development of a Sign Language (SL) translation mobile application and an open communications framework. SignON rectifies the lack of technology and services for the automatic translation between signed and spoken languages, through an inclusive, humancentric solution which facilitates communication between deaf, hard of hearing (DHH) and hearing individuals. We present an overview of the current status of the project, describing the milestones reached to date and the approaches that are being developed to address the challenges and peculiarities of Sign Language Machine Translation (SLMT).
pdf
bib
abs
Challenges with Sign Language Datasets for Sign Language Recognition and Translation
Mirella De Sisto
|
Vincent Vandeghinste
|
Santiago Egea Gómez
|
Mathieu De Coster
|
Dimitar Shterionov
|
Horacio Saggion
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Sign Languages (SLs) are the primary means of communication for at least half a million people in Europe alone. However, the development of SL recognition and translation tools is slowed down by a series of obstacles concerning resource scarcity and standardization issues in the available data. The former challenge relates to the volume of data available for machine learning as well as the time required to collect and process new data. The latter obstacle is linked to the variety of the data, i.e., annotation formats are not unified and vary amongst different resources. The available data formats are often not suitable for machine learning, obstructing the provision of automatic tools based on neural models. In the present paper, we give an overview of these challenges by comparing various SL corpora and SL machine learning datasets. Furthermore, we propose a framework to address the lack of standardization at format level, unify the available resources and facilitate SL research for different languages. Our framework takes ELAN files as inputs and returns textual and visual data ready to train SL recognition and translation models. We present a proof of concept, training neural translation models on the data produced by the proposed framework.
2021
pdf
bib
abs
Defining meaningful units. Challenges in sign segmentation and segment-meaning mapping (short paper)
Mirella De Sisto
|
Dimitar Shterionov
|
Irene Murtagh
|
Myriam Vermeerbergen
|
Lorraine Leeson
Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)
This paper addresses the tasks of sign segmentation and segment-meaning mapping in the context of sign language (SL) recognition. It aims to give an overview of the linguistic properties of SL, such as coarticulation and simultaneity, which make these tasks complex. A better understanding of SL structure is the necessary ground for the design and development of SL recognition and segmentation methodologies, which are fundamental for machine translation of these languages. Based on this preliminary exploration, a proposal for mapping segments to meaning in the form of an agglomerate of lexical and non-lexical information is introduced.