Mirella De Sisto

Also published as: Mirella De Sisto

2025

User Involvement in the Research and Development Life Cycle of Sign Language Machine Translation Systems
Lisa Lepp | Dimitar Shterionov | Mirella De Sisto
Proceedings of the Third International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)

Machine translation (MT) has evolved rapidly over the last 70 years thanks to the advances in processing technology, methodologies as well as the ever-increasing volumes of data. This trend is observed in the context of MT for spoken languages. However, when it comes to sign languages (SL) translation technologies, the progress is much slower; SLMT is still in its infancy with limited applications. One of the main factors for this set back is the lack of effective, respectful and fair user involvement across the different phases of the research and development of SLMT. We present a meta-review of 111 articles on SLMT from the perspective of user involvement. Our analysis investigates what users are involved and what tasks they assume in the first four phrases of MT research: (i) Problem and definition, (ii) Dataset construction, (iii) Model Design and Training, (iv) Model Validation and Evaluation. We find out that users have primarily been involved as data creators and monitors as well as evaluators. We assess that effective co-creation, as defined in (Lepp et al., 2025), has not been performed and conclude with recommendations for improving the MT research and development landscape from a co-creative perspective.

pdf bib

2023

SignON (https://signon-project.eu/) is a Horizon 2020 project, running from 2021 until the end of 2023, which addresses the lack of technology and services for the automatic translation between sign languages (SLs) and spoken languages, through an inclusive, human-centric solution, hence contributing to the repertoire of communication media for deaf, hard of hearing (DHH) and hearing individuals. In this paper, we present an update of the status of the project, describing the approaches developed to address the challenges and peculiarities of SL machine translation (SLMT).

pdf bib abs

A New English-Dutch-NGT Corpus for the Hospitality Domain
Mirella De Sisto | Vincent Vandeghinste | Dimitar Shterionov
Proceedings of the Second International Workshop on Automatic Translation for Signed and Spoken Languages

One of the major challenges hampering the development of language technology which targets sign languages is the extremely limited availability of good quality data geared towards machine learning and deep learning approaches. In this paper we introduce the NGT-Dutch Hotel Review Corpus (NGT-HoReCo), which addresses this issue by providing multimodal parallel data in English, Dutch and Sign Language of the Netherlands (NGT). The corpus contains 283 hotel reviews in written English, translated into written Dutch and into NGT videos. It will be made publicly available through CLARIN and through the ELG platform.

pdf bib abs

Tailoring Domain Adaptation for Machine Translation Quality Estimation
Javad Pourmostafa Roshan Sharami | Dimitar Shterionov | Frédéric Blain | Eva Vanmassenhove | Mirella De Sisto | Chris Emmery | Pieter Spronck
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high-cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizabile, i.e., they should be able to handle data from different domains, both generic and specific. To alleviate these two main issues — data scarcity and domain mismatch — this paper combines domain adaptation and data augmentation within a robust QE system. Our method is to first train a generic QE model and then fine-tune it on a specific domain while retaining generic knowledge. Our results show a significant improvement for all the language pairs investigated, better cross-lingual inference, and a superior performance in zero-shot learning scenarios as compared to state-of-the-art baselines.

pdf bib abs

GoSt-ParC-Sign: Gold Standard Parallel Corpus of Sign and spoken language
Mirella De Sisto | Vincent Vandeghinste | Lien Soetemans | Caro Brosens | Dimitar Shterionov
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

Good quality training data for Sign Language Machine Translation (SLMT) is extremely scarce, and this is one of the challenges that any project focusing on Machine Translation (MT) which also targets sign languages is currently facing. The goal of this ongoing project is to create a parallel corpus of authentic Flemish Sign Language (VGT) and written Dutch which can be employed as gold standard in automated sign language translation. The availability of a gold standard corpus like Gost-ParC-Sign can facilitate the advances of SLMT; consequently, it supports and promotes inclusiveness in MT and, on a more general level, in language technology

pdf bib

2022

pdf bib abs

The SignON project (www.signon-project.eu) focuses on the research and development of a Sign Language (SL) translation mobile application and an open communications framework. SignON rectifies the lack of technology and services for the automatic translation between signed and spoken languages, through an inclusive, humancentric solution which facilitates communication between deaf, hard of hearing (DHH) and hearing individuals. We present an overview of the current status of the project, describing the milestones reached to date and the approaches that are being developed to address the challenges and peculiarities of Sign Language Machine Translation (SLMT).

pdf bib abs

Challenges with Sign Language Datasets for Sign Language Recognition and Translation
Mirella De Sisto | Vincent Vandeghinste | Santiago Egea Gómez | Mathieu De Coster | Dimitar Shterionov | Horacio Saggion
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Sign Languages (SLs) are the primary means of communication for at least half a million people in Europe alone. However, the development of SL recognition and translation tools is slowed down by a series of obstacles concerning resource scarcity and standardization issues in the available data. The former challenge relates to the volume of data available for machine learning as well as the time required to collect and process new data. The latter obstacle is linked to the variety of the data, i.e., annotation formats are not unified and vary amongst different resources. The available data formats are often not suitable for machine learning, obstructing the provision of automatic tools based on neural models. In the present paper, we give an overview of these challenges by comparing various SL corpora and SL machine learning datasets. Furthermore, we propose a framework to address the lack of standardization at format level, unify the available resources and facilitate SL research for different languages. Our framework takes ELAN files as inputs and returns textual and visual data ready to train SL recognition and translation models. We present a proof of concept, training neural translation models on the data produced by the proposed framework.

2021

pdf bib abs

Defining meaningful units. Challenges in sign segmentation and segment-meaning mapping (short paper)
Mirella De Sisto | Dimitar Shterionov | Irene Murtagh | Myriam Vermeerbergen | Lorraine Leeson
Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)

This paper addresses the tasks of sign segmentation and segment-meaning mapping in the context of sign language (SL) recognition. It aims to give an overview of the linguistic properties of SL, such as coarticulation and simultaneity, which make these tasks complex. A better understanding of SL structure is the necessary ground for the design and development of SL recognition and segmentation methodologies, which are fundamental for machine translation of these languages. Based on this preliminary exploration, a proposal for mapping segments to meaning in the form of an agglomerate of lexical and non-lexical information is introduced.

Mirella De Sisto

2025

2023

2022

2021

Co-authors

Venues