Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production

Ben Saunders, Necati Cihan Camgöz, Richard Bowden


Abstract
Recent approaches to Sign Language Production (SLP) have adopted spoken language Neural Machine Translation (NMT) architectures, applied without sign-specific modifications. In addition, these works represent sign language as a sequence of skeleton pose vectors, projected to an abstract representation with no inherent skeletal structure. In this paper, we represent sign language sequences as a skeletal graph structure, with joints as nodes and both spatial and temporal connections as edges. To operate on this graphical structure, we propose Skeletal Graph Self-Attention (SGSA), a novel graphical attention layer that embeds a skeleton inductive bias into the SLP model. Retaining the skeletal feature representation throughout, we directly apply a spatio-temporal adjacency matrix into the self-attention formulation. This provides structure and context to each skeletal joint that is not possible when using a non-graphical abstract representation, enabling fluid and expressive sign language production. We evaluate our Skeletal Graph Self-Attention architecture on the challenging RWTH-PHOENIX-Weather-2014T (PHOENIX14T) dataset, achieving state-of-the-art back translation performance with an 8% and 7% improvement over competing methods for the dev and test sets.
Anthology ID:
2022.sltat-1.15
Volume:
Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Eleni Efthimiou, Stavroula-Evita Fotinea, Thomas Hanke, John C. McDonald, Dimitar Shterionov, Rosalee Wolfe
Venue:
SLTAT
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
95–102
Language:
URL:
https://aclanthology.org/2022.sltat-1.15
DOI:
Bibkey:
Cite (ACL):
Ben Saunders, Necati Cihan Camgöz, and Richard Bowden. 2022. Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production. In Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives, pages 95–102, Marseille, France. European Language Resources Association.
Cite (Informal):
Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production (Saunders et al., SLTAT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.sltat-1.15.pdf
Data
RWTH-PHOENIX-Weather 2014 T