2024
pdf
bib
abs
SignCLIP: Connecting Text and Sign Language by Contrastive Learning
Zifan Jiang
|
Gerard Sant
|
Amit Moryossef
|
Mathias Müller
|
Rico Sennrich
|
Sarah Ebling
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We present SignCLIP, which re-purposes CLIP (Contrastive Language-Image Pretraining) to project spoken language text and sign language videos, two classes of natural languages of distinct modalities, into the same space. SignCLIP is an efficient method of learning useful visual representations for sign language processing from large-scale, multilingual video-text pairs, without directly optimizing for a specific task or sign language which is often of limited size.We pretrain SignCLIP on Spreadthesign, a prominent sign language dictionary consisting of ~500 thousand video clips in up to 44 sign languages, and evaluate it with various downstream datasets. SignCLIP discerns in-domain signing with notable text-to-video/video-to-text retrieval accuracy. It also performs competitively for out-of-domain downstream tasks such as isolated sign language recognition upon essential few-shot prompting or fine-tuning.We analyze the latent space formed by the spoken language text and sign language poses, which provides additional linguistic insights. Our code and models are openly available.
pdf
bib
abs
SwissSLi: The Multi-parallel Sign Language Corpus for Switzerland
Zifan Jiang
|
Anne Göhring
|
Amit Moryossef
|
Rico Sennrich
|
Sarah Ebling
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In this work, we introduce SwissSLi, the first sign language corpus that contains parallel data of all three Swiss sign languages, namely Swiss German Sign Language (DSGS), French Sign Language of Switzerland (LSF-CH), and Italian Sign Language of Switzerland (LIS-CH). The data underlying this corpus originates from television programs in three spoken languages: German, French, and Italian. The programs have for the most part been translated into sign language by deaf translators, resulting in a unique, up to six-way multi-parallel dataset between spoken and sign languages. We describe and release the sign language videos and spoken language subtitles as well as the overall statistics and some derivatives of the raw material. These derived components include cropped videos, pose estimation, phrase/sign-segmented videos, and sentence-segmented subtitles, all of which facilitate downstream tasks such as sign language transcription (glossing) and machine translation. The corpus is publicly available on the SWISSUbase data platform for research purposes only under a CC BY-NC-SA 4.0 license.
2023
pdf
bib
abs
Considerations for meaningful sign language machine translation based on glosses
Mathias Müller
|
Zifan Jiang
|
Amit Moryossef
|
Annette Rios
|
Sarah Ebling
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Automatic sign language processing is gaining popularity in Natural Language Processing (NLP) research (Yin et al., 2021). In machine translation (MT) in particular, sign language translation based on glosses is a prominent approach. In this paper, we review recent works on neural gloss translation. We find that limitations of glosses in general and limitations of specific datasets are not discussed in a transparent manner and that there is no common standard for evaluation. To address these issues, we put forward concrete recommendations for future research on gloss translation. Our suggestions advocate awareness of the inherent limitations of gloss-based approaches, realistic datasets, stronger baselines and convincing evaluation.
pdf
bib
abs
Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting
Zifan Jiang
|
Amit Moryossef
|
Mathias Müller
|
Sarah Ebling
Findings of the Association for Computational Linguistics: EACL 2023
This paper presents work on novel machine translation (MT) systems between spoken and signed languages, where signed languages are represented in SignWriting, a sign language writing system. Our work seeks to address the lack of out-of-the-box support for signed languages in current MT systems and is based on the SignBank dataset, which contains pairs of spoken language text and SignWriting content. We introduce novel methods to parse, factorize, decode, and evaluate SignWriting, leveraging ideas from neural factored MT. In a bilingual setup—translating from American Sign Language to (American) English—our method achieves over 30 BLEU, while in two multilingual setups—translating in both directions between spoken languages and signed languages—we achieve over 20 BLEU. We find that common MT techniques used to improve spoken language translation similarly affect the performance of sign language translation. These findings validate our use of an intermediate text representation for signed languages to include them in natural language processing research.
pdf
bib
abs
Linguistically Motivated Sign Language Segmentation
Amit Moryossef
|
Zifan Jiang
|
Mathias Müller
|
Sarah Ebling
|
Yoav Goldberg
Findings of the Association for Computational Linguistics: EMNLP 2023
Sign language segmentation is a crucial task in sign language processing systems. It enables downstream tasks such as sign recognition, transcription, and machine translation. In this work, we consider two kinds of segmentation: segmentation into individual signs and segmentation into phrases, larger units comprising several signs. We propose a novel approach to jointly model these two tasks. Our method is motivated by linguistic cues observed in sign language corpora. We replace the predominant IO tagging scheme with BIO tagging to account for continuous signing. Given that prosody plays a significant role in phrase boundaries, we explore the use of optical flow features. We also provide an extensive analysis of hand shapes and 3D hand normalization. We find that introducing BIO tagging is necessary to model sign boundaries. Explicitly encoding prosody by optical flow improves segmentation in shallow models, but its contribution is negligible in deeper models. Careful tuning of the decoding algorithm atop the models further improves the segmentation quality. We demonstrate that our final models generalize to out-of-domain video content in a different signed language, even under a zero-shot setting. We observe that including optical flow and 3D hand normalization enhances the robustness of the model in this context.
pdf
bib
abs
An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation
Amit Moryossef
|
Mathias Müller
|
Anne Göhring
|
Zifan Jiang
|
Yoav Goldberg
|
Sarah Ebling
Proceedings of the Second International Workshop on Automatic Translation for Signed and Spoken Languages
Sign language translation systems are complex and require many components. As a result, it is very hard to compare methods across publications. We present an open-source implementation of a text-to-gloss-to-pose-to-video pipeline approach, demonstrating conversion from German to Swiss German Sign Language, French to French Sign Language of Switzerland, and Italian to Italian Sign Language of Switzerland. We propose three different components for the text-to-gloss translation: a lemmatizer, a rule-based word reordering and dropping component, and a neural machine translation system. Gloss-to-pose conversion occurs using data from a lexicon for three different signed languages, with skeletal poses extracted from videos. To generate a sentence, the text-to-gloss system is first run, and the pose representations of the resulting signs are stitched together.
pdf
bib
abs
First WMT Shared Task on Sign Language Translation (WMT-SLT22)
Mathias Müller
|
Sarah Ebling
|
Eleftherios Avramidis
|
Alessia Battisti
|
Michèle Berger
|
Richard Bowden
|
Annelies Braffort
|
Necati Cihan Camgoz
|
Cristina España-Bonet
|
Roman Grundkiewicz
|
Zifan Jiang
|
Oscar Koller
|
Amit Moryossef
|
Regula Perrollaz
|
Sabine Reinhard
|
Annette Rios Gonzales
|
Dimitar Shterionov
|
Sandra Sidler-Miserez
|
Katja Tissi
|
Davy Van Landuyt
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
This paper is a brief summary of the First WMT Shared Task on Sign Language Translation (WMT-SLT22), a project partly funded by EAMT. The focus of this shared task is automatic translation between signed and spoken languages. Details can be found on our website (
https://www.wmt-slt.com/) or in the findings paper (Müller et al., 2022).
pdf
bib
abs
Findings of the Second WMT Shared Task on Sign Language Translation (WMT-SLT23)
Mathias Müller
|
Malihe Alikhani
|
Eleftherios Avramidis
|
Richard Bowden
|
Annelies Braffort
|
Necati Cihan Camgöz
|
Sarah Ebling
|
Cristina España-Bonet
|
Anne Göhring
|
Roman Grundkiewicz
|
Mert Inan
|
Zifan Jiang
|
Oscar Koller
|
Amit Moryossef
|
Annette Rios
|
Dimitar Shterionov
|
Sandra Sidler-Miserez
|
Katja Tissi
|
Davy Van Landuyt
Proceedings of the Eighth Conference on Machine Translation
This paper presents the results of the Second WMT Shared Task on Sign Language Translation (WMT-SLT23; https://www.wmt-slt.com/). This shared task is concerned with automatic translation between signed and spoken languages. The task is unusual in the sense that it requires processing visual information (such as video frames or human pose estimation) beyond the well-known paradigm of text-to-text machine translation (MT). The task offers four tracks involving the following languages: Swiss German Sign Language (DSGS), French Sign Language of Switzerland (LSF-CH), Italian Sign Language of Switzerland (LIS-CH), German, French and Italian. Four teams (including one working on a baseline submission) participated in this second edition of the task, all submitting to the DSGS-to-German track. Besides a system ranking and system papers describing state-of-the-art techniques, this shared task makes the following scientific contributions: novel corpora and reproducible baseline systems. Finally, the task also resulted in publicly available sets of system outputs and more human evaluation scores for sign language translation.
2022
pdf
bib
abs
Findings of the First WMT Shared Task on Sign Language Translation (WMT-SLT22)
Mathias Müller
|
Sarah Ebling
|
Eleftherios Avramidis
|
Alessia Battisti
|
Michèle Berger
|
Richard Bowden
|
Annelies Braffort
|
Necati Cihan Camgöz
|
Cristina España-bonet
|
Roman Grundkiewicz
|
Zifan Jiang
|
Oscar Koller
|
Amit Moryossef
|
Regula Perrollaz
|
Sabine Reinhard
|
Annette Rios
|
Dimitar Shterionov
|
Sandra Sidler-miserez
|
Katja Tissi
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper presents the results of the First WMT Shared Task on Sign Language Translation (WMT-SLT22).This shared task is concerned with automatic translation between signed and spoken languages. The task is novel in the sense that it requires processing visual information (such as video frames or human pose estimation) beyond the well-known paradigm of text-to-text machine translation (MT).The task featured two tracks, translating from Swiss German Sign Language (DSGS) to German and vice versa. Seven teams participated in this first edition of the task, all submitting to the DSGS-to-German track. Besides a system ranking and system papers describing state-of-the-art techniques, this shared task makes the following scientific contributions: novel corpora, reproducible baseline systems and new protocols and software for human evaluation. Finally, the task also resulted in the first publicly available set of system outputs and human evaluation scores for sign language translation.