Taiwanese-Accented Mandarin and English Multi-Speaker Talking-Face Synthesis System
Chia-Hsuan Lin | Jian-Peng Liao | Cho-Chun Hsieh | Kai-Chun Liao | Chun-Hsin Wu
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
This paper proposes a multi-speaker talking-face synthesis system. The system incorporates voice cloning and lip-syncing technology to achieve text-to-talking-face generation by acquiring audio and video clips of any speaker and using zero-shot transfer learning. In addition, we used open-source corpora to train several Taiwanese-accented models and proposed using Mandarin Phonetic Symbols (Bopomofo) as the character embedding of the synthesizer to improve the system’s ability to synthesize Chinese-English code-switched sentences. Through our system, users can create rich applications. Also, the research on this technology is novel in the audiovisual speech synthesis field.