Taiwanese-Accented Mandarin and English Multi-Speaker Talking-Face Synthesis System

Chia-Hsuan Lin, Jian-Peng Liao, Cho-Chun Hsieh, Kai-Chun Liao, Chun-Hsin Wu


Abstract
This paper proposes a multi-speaker talking-face synthesis system. The system incorporates voice cloning and lip-syncing technology to achieve text-to-talking-face generation by acquiring audio and video clips of any speaker and using zero-shot transfer learning. In addition, we used open-source corpora to train several Taiwanese-accented models and proposed using Mandarin Phonetic Symbols (Bopomofo) as the character embedding of the synthesizer to improve the system’s ability to synthesize Chinese-English code-switched sentences. Through our system, users can create rich applications. Also, the research on this technology is novel in the audiovisual speech synthesis field.
Anthology ID:
2022.rocling-1.6
Volume:
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Month:
November
Year:
2022
Address:
Taipei, Taiwan
Editors:
Yung-Chun Chang, Yi-Chin Huang
Venue:
ROCLING
SIG:
Publisher:
The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Note:
Pages:
40–48
Language:
Chinese
URL:
https://aclanthology.org/2022.rocling-1.6
DOI:
Bibkey:
Cite (ACL):
Chia-Hsuan Lin, Jian-Peng Liao, Cho-Chun Hsieh, Kai-Chun Liao, and Chun-Hsin Wu. 2022. Taiwanese-Accented Mandarin and English Multi-Speaker Talking-Face Synthesis System. In Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), pages 40–48, Taipei, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
Cite (Informal):
Taiwanese-Accented Mandarin and English Multi-Speaker Talking-Face Synthesis System (Lin et al., ROCLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.rocling-1.6.pdf