2024
pdf
bib
abs
Towards Privacy-Aware Sign Language Translation at Scale
Phillip Rust
|
Bowen Shi
|
Skyler Wang
|
Necati Cihan Camgoz
|
Jean Maillard
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
A major impediment to the advancement of sign language translation (SLT) is data scarcity. Much of the sign language data currently available on the web cannot be used for training supervised models due to the lack of aligned captions. Furthermore, scaling SLT using large-scale web-scraped datasets bears privacy risks due to the presence of biometric information, which the responsible development of SLT technologies should account for. In this work, we propose a two-stage framework for privacy-aware SLT at scale that addresses both of these issues. We introduce SSVP-SLT, which leverages self-supervised video pretraining on anonymized and unannotated videos, followed by supervised SLT finetuning on a curated parallel dataset. SSVP-SLT achieves state-of-the-art finetuned and zero-shot gloss-free SLT performance on the How2Sign dataset, outperforming the strongest respective baselines by over 3 BLEU-4. Based on controlled experiments, we further discuss the advantages and limitations of self-supervised pretraining and anonymization via facial obfuscation for SLT.
2023
pdf
bib
abs
First WMT Shared Task on Sign Language Translation (WMT-SLT22)
Mathias Müller
|
Sarah Ebling
|
Eleftherios Avramidis
|
Alessia Battisti
|
Michèle Berger
|
Richard Bowden
|
Annelies Braffort
|
Necati Cihan Camgoz
|
Cristina España-Bonet
|
Roman Grundkiewicz
|
Zifan Jiang
|
Oscar Koller
|
Amit Moryossef
|
Regula Perrollaz
|
Sabine Reinhard
|
Annette Rios Gonzales
|
Dimitar Shterionov
|
Sandra Sidler-Miserez
|
Katja Tissi
|
Davy Van Landuyt
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
This paper is a brief summary of the First WMT Shared Task on Sign Language Translation (WMT-SLT22), a project partly funded by EAMT. The focus of this shared task is automatic translation between signed and spoken languages. Details can be found on our website (
https://www.wmt-slt.com/) or in the findings paper (Müller et al., 2022).
pdf
bib
abs
Findings of the Second WMT Shared Task on Sign Language Translation (WMT-SLT23)
Mathias Müller
|
Malihe Alikhani
|
Eleftherios Avramidis
|
Richard Bowden
|
Annelies Braffort
|
Necati Cihan Camgöz
|
Sarah Ebling
|
Cristina España-Bonet
|
Anne Göhring
|
Roman Grundkiewicz
|
Mert Inan
|
Zifan Jiang
|
Oscar Koller
|
Amit Moryossef
|
Annette Rios
|
Dimitar Shterionov
|
Sandra Sidler-Miserez
|
Katja Tissi
|
Davy Van Landuyt
Proceedings of the Eighth Conference on Machine Translation
This paper presents the results of the Second WMT Shared Task on Sign Language Translation (WMT-SLT23; https://www.wmt-slt.com/). This shared task is concerned with automatic translation between signed and spoken languages. The task is unusual in the sense that it requires processing visual information (such as video frames or human pose estimation) beyond the well-known paradigm of text-to-text machine translation (MT). The task offers four tracks involving the following languages: Swiss German Sign Language (DSGS), French Sign Language of Switzerland (LSF-CH), Italian Sign Language of Switzerland (LIS-CH), German, French and Italian. Four teams (including one working on a baseline submission) participated in this second edition of the task, all submitting to the DSGS-to-German track. Besides a system ranking and system papers describing state-of-the-art techniques, this shared task makes the following scientific contributions: novel corpora and reproducible baseline systems. Finally, the task also resulted in publicly available sets of system outputs and more human evaluation scores for sign language translation.
2022
pdf
bib
abs
Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production
Ben Saunders
|
Necati Cihan Camgöz
|
Richard Bowden
Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives
Recent approaches to Sign Language Production (SLP) have adopted spoken language Neural Machine Translation (NMT) architectures, applied without sign-specific modifications. In addition, these works represent sign language as a sequence of skeleton pose vectors, projected to an abstract representation with no inherent skeletal structure. In this paper, we represent sign language sequences as a skeletal graph structure, with joints as nodes and both spatial and temporal connections as edges. To operate on this graphical structure, we propose Skeletal Graph Self-Attention (SGSA), a novel graphical attention layer that embeds a skeleton inductive bias into the SLP model. Retaining the skeletal feature representation throughout, we directly apply a spatio-temporal adjacency matrix into the self-attention formulation. This provides structure and context to each skeletal joint that is not possible when using a non-graphical abstract representation, enabling fluid and expressive sign language production. We evaluate our Skeletal Graph Self-Attention architecture on the challenging RWTH-PHOENIX-Weather-2014T (PHOENIX14T) dataset, achieving state-of-the-art back translation performance with an 8% and 7% improvement over competing methods for the dev and test sets.
pdf
bib
abs
Findings of the First WMT Shared Task on Sign Language Translation (WMT-SLT22)
Mathias Müller
|
Sarah Ebling
|
Eleftherios Avramidis
|
Alessia Battisti
|
Michèle Berger
|
Richard Bowden
|
Annelies Braffort
|
Necati Cihan Camgöz
|
Cristina España-bonet
|
Roman Grundkiewicz
|
Zifan Jiang
|
Oscar Koller
|
Amit Moryossef
|
Regula Perrollaz
|
Sabine Reinhard
|
Annette Rios
|
Dimitar Shterionov
|
Sandra Sidler-miserez
|
Katja Tissi
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper presents the results of the First WMT Shared Task on Sign Language Translation (WMT-SLT22).This shared task is concerned with automatic translation between signed and spoken languages. The task is novel in the sense that it requires processing visual information (such as video frames or human pose estimation) beyond the well-known paradigm of text-to-text machine translation (MT).The task featured two tracks, translating from Swiss German Sign Language (DSGS) to German and vice versa. Seven teams participated in this first edition of the task, all submitting to the DSGS-to-German track. Besides a system ranking and system papers describing state-of-the-art techniques, this shared task makes the following scientific contributions: novel corpora, reproducible baseline systems and new protocols and software for human evaluation. Finally, the task also resulted in the first publicly available set of system outputs and human evaluation scores for sign language translation.
2020
pdf
bib
abs
BosphorusSign22k Sign Language Recognition Dataset
Oğulcan Özdemir
|
Ahmet Alp Kındıroğlu
|
Necati Cihan Camgöz
|
Lale Akarun
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives
Sign Language Recognition is a challenging research domain. It has recently seen several advancements with the increased availability of data. In this paper, we introduce the BosphorusSign22k, a publicly available large scale sign language dataset aimed at computer vision, video recognition and deep learning research communities. The primary objective of this dataset is to serve as a new benchmark in Turkish Sign Language Recognition for its vast lexicon, the high number of repetitions by native signers, high recording quality, and the unique syntactic properties of the signs it encompasses. We also provide state-of-the-art human pose estimates to encourage other tasks such as Sign Language Production. We survey other publicly available datasets and expand on how BosphorusSign22k can contribute to future research that is being made possible through the widespread availability of similar Sign Language resources. We have conducted extensive experiments and present baseline results to underpin future research on our dataset.
2018
pdf
bib
SMILE Swiss German Sign Language Dataset
Sarah Ebling
|
Necati Cihan Camgöz
|
Penny Boyes Braem
|
Katja Tissi
|
Sandra Sidler-Miserez
|
Stephanie Stoll
|
Simon Hadfield
|
Tobias Haug
|
Richard Bowden
|
Sandrine Tornay
|
Marzieh Razavi
|
Mathew Magimai-Doss
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2016
pdf
bib
abs
BosphorusSign: A Turkish Sign Language Recognition Corpus in Health and Finance Domains
Necati Cihan Camgöz
|
Ahmet Alp Kındıroğlu
|
Serpil Karabüklü
|
Meltem Kelepir
|
Ayşe Sumru Özsoy
|
Lale Akarun
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
There are as many sign languages as there are deaf communities in the world. Linguists have been collecting corpora of different sign languages and annotating them extensively in order to study and understand their properties. On the other hand, the field of computer vision has approached the sign language recognition problem as a grand challenge and research efforts have intensified in the last 20 years. However, corpora collected for studying linguistic properties are often not suitable for sign language recognition as the statistical methods used in the field require large amounts of data. Recently, with the availability of inexpensive depth cameras, groups from the computer vision community have started collecting corpora with large number of repetitions for sign language recognition research. In this paper, we present the BosphorusSign Turkish Sign Language corpus, which consists of 855 sign and phrase samples from the health, finance and everyday life domains. The corpus is collected using the state-of-the-art Microsoft Kinect v2 depth sensor, and will be the first in this sign language research field. Furthermore, there will be annotations rendered by linguists so that the corpus will appeal both to the linguistic and sign language recognition research communities.