We present the acquisition process and the data of DGS-Fabeln-1, a parallel corpus of German text and videos containing German fairy tales interpreted into the German Sign Language (DGS) by a native DGS signer. The corpus contains 573 segments of videos with a total duration of 1 hour and 32 minutes, corresponding with 1428 written sentences. It is the first corpus of semi-naturally expressed DGS that has been filmed from 7 angles, and one of the few sign language (SL) corpora globally which have been filmed from more than 3 angles and where the listener has been simultaneously filmed. The corpus aims at aiding research at SL linguistics, SL machine translation and affective computing, and is freely available for research purposes at the following address: https://doi.org/10.5281/zenodo.10822097.
In this paper, we investigate the capability of convolutional neural networks to recognize in sign language video frames the six basic Ekman facial expressions for ‘fear’, ‘disgust’, ‘surprise’, ‘sadness’, ‘happiness’, ‘anger’ along with the ‘neutral’ class. Given the limited amount of annotated facial expression data for the sign language domain, we started from a model pre-trained on general-purpose facial expression datasets and we applied various machine learning techniques such as fine-tuning, data augmentation, class balancing, as well as image preprocessing to reach a better accuracy. The models were evaluated using K-fold cross-validation to get more accurate conclusions. It is experimentally demonstrated that fine-tuning a pre-trained model along with data augmentation by horizontally flipping images and image normalization, helps in providing the best accuracy on the sign language dataset. The best setting achieves satisfactory classification accuracy, comparable to state-of-the-art systems in generic facial expression recognition. Experiments were performed using different combinations of the above-mentioned techniques based on two different architectures, namely MobileNet and EfficientNet, and is deemed that both architectures seem equally suitable for the purpose of fine-tuning, whereas class balancing is discouraged.
We present the requirements, design guidelines, and the software architecture of an open-source toolkit dedicated to the pre-processing of sign language video material. The toolkit is a collection of functions and command-line tools designed to be integrated with build automation systems. Every pre-processing tool is dedicated to standard pre-processing operations (e.g., trimming, cropping, resizing) or feature extraction (e.g., identification of areas of interest, landmark detection) and can be used also as a standalone Python module. The UML diagrams of its architecture are presented together with a few working examples of its usage. The software is freely available with an open-source license on a public repository.
This paper presents an overview of AVASAG; an ongoing applied-research project developing a text-to-sign-language translation system for public services. We describe the scientific innovation points (geometry-based SL-description, 3D animation and video corpus, simplified annotation scheme, motion capture strategy) and the overall translation pipeline.