Inter-language Transfer Learning for Visual Speech Recognition toward Under-resourced Environments

Fumiya Kondo, Satoshi Tamura


Abstract
In this study, we introduce a method of inter-language transfer learning for under-resourced visual speech recognition. Deploying speech-related technology to all languages is a quite important activity. However, applying state-of-the-art deep-learning techniques requires huge-size labeled corpora, which makes it hard for under-resourced languages. Our approach leverages a small amount of labeled video data of the target language, and employs inter-language transfer learning using a pre-trained English lip-reading model. By applying the proposed scheme, we build a Japanese lip-reading model, using the ROHAN corpus, the size of which is about one 450th of the size of English datasets. The front-end encoder part of the pre-trained model is fine-tuned to improve the acquisition of pronunciation and lip movement patterns unique to Japanese. On the other hand, the back-end encoder and the decoder are built using the Japanese dataset. Although English and Japanese have different language structures, evaluation experiments show that it is possible to build the Japanese lip-reading model efficiently. Comparison with competitive schemes demonstrates the effectiveness of our method.
Anthology ID:
2024.sigul-1.19
Volume:
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Maite Melero, Sakriani Sakti, Claudia Soria
Venues:
SIGUL | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
149–154
Language:
URL:
https://aclanthology.org/2024.sigul-1.19
DOI:
Bibkey:
Cite (ACL):
Fumiya Kondo and Satoshi Tamura. 2024. Inter-language Transfer Learning for Visual Speech Recognition toward Under-resourced Environments. In Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 149–154, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Inter-language Transfer Learning for Visual Speech Recognition toward Under-resourced Environments (Kondo & Tamura, SIGUL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sigul-1.19.pdf