Word-Conditioned 3D American Sign Language Motion Generation

Lu Dong, Xiao Wang, Ifeoma Nwogu


Abstract
Sign words are the building blocks of any sign language. In this work, we present wSignGen, a word-conditioned 3D American Sign Language (ASL) generation model dedicated to synthesizing realistic and grammatically accurate motion sequences for sign words. Our approach leverages a transformer-based diffusion model, trained on a curated dataset of 3D motion meshes from word-level ASL videos. By integrating CLIP, wSignGen offers two advantages: image-based generation, which is particularly useful for children learning sign language but not yet able to read, and the ability to generalize to unseen synonyms. Experiments demonstrate that wSignGen significantly outperforms the baseline model in the task of sign word generation. Moreover, human evaluation experiments show that wSignGen can generate high-quality, grammatically correct ASL signs effectively conveyed through 3D avatars.
Anthology ID:
2024.findings-emnlp.584
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9993–9999
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.584/
DOI:
10.18653/v1/2024.findings-emnlp.584
Bibkey:
Cite (ACL):
Lu Dong, Xiao Wang, and Ifeoma Nwogu. 2024. Word-Conditioned 3D American Sign Language Motion Generation. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9993–9999, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Word-Conditioned 3D American Sign Language Motion Generation (Dong et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.584.pdf