Improvement in Sign Language Translation Using Text CTC Alignment

Sihan Tan, Taro Miyazaki, Nabeela Khan, Kazuhiro Nakadai


Abstract
Current sign language translation (SLT) approaches often rely on gloss-based supervision with Connectionist Temporal Classification (CTC), limiting their ability to handle non-monotonic alignments between sign language video and spoken text. In this work, we propose a novel method combining joint CTC/Attention and transfer learning. The joint CTC/Attention introduces hierarchical encoding and integrates CTC with the attention mechanism during decoding, effectively managing both monotonic and non-monotonic alignments. Meanwhile, transfer learning helps bridge the modality gap between vision and language in SLT. Experimental results on two widely adopted benchmarks, RWTH-PHOENIX-Weather 2014 T and CSL-Daily, show that our method achieves results comparable to state-of-the-art and outperforms the pure-attention baseline. Additionally, this work opens a new door for future research into gloss-free SLT using text-based CTC alignment.
Anthology ID:
2025.coling-main.219
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3255–3266
Language:
URL:
https://aclanthology.org/2025.coling-main.219/
DOI:
Bibkey:
Cite (ACL):
Sihan Tan, Taro Miyazaki, Nabeela Khan, and Kazuhiro Nakadai. 2025. Improvement in Sign Language Translation Using Text CTC Alignment. In Proceedings of the 31st International Conference on Computational Linguistics, pages 3255–3266, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Improvement in Sign Language Translation Using Text CTC Alignment (Tan et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.219.pdf