Yi-Chin Huang


2021

pdf bib
Incorporating speaker embedding and post-filter network for improving speaker similarity of personalized speech synthesis system
Sheng-Yao Wang | Yi-Chin Huang
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

In recent years, speech synthesis system can generate speech with high speech quality. However, multi-speaker text-to-speech (TTS) system still require large amount of speech data for each target speaker. In this study, we would like to construct a multi-speaker TTS system by incorporating two sub modules into artificial neural network-based speech synthesis system to alleviate this problem. First module is to add speaker embedding into encoding module for generating speech while a large amount of the speech data from target speaker is not necessary. For speaker embedding method, in our study, two main speaker embedding methods, namely speaker verification embedding and voice conversion embedding, are compared to deciding which one is suitable for our personalized TTS system. Second, we substituted the conventional post-net module, which is adopted to enhance the output spectrum sequence, to further improving the speech quality of the generated speech utterance. Here, a post-filter network is used. Finally, experiment results showed that the speaker embedding is useful by adding it into encoding module and the resultant speech utterance indeed perceived as the target speaker. Also, the post-filter network not only improving the speech quality and also enhancing the speaker similarity of the generated speech utterances. The constructed TTS system can generate a speech utterance of the target speaker in fewer than 2 seconds. In the future, we would like to further investigate the controllability of the speaking rate or perceived emotion state of the generated speech.

pdf bib
整合語者嵌入向量與後置濾波器於提升個人化合成語音之語者相似度 (Incorporating Speaker Embedding and Post-Filter Network for Improving Speaker Similarity of Personalized Speech Synthesis System)
Sheng-Yao Wang | Yi-Chin Huang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 26, Number 2, December 2021

2019

pdf bib
Efficient text generation of user-defined topic using generative adversarial networks
Chenhan Yuan | Yi-Chin Huang | Cheng-Hung Tsai
Proceedings of the 4th Workshop on Computational Creativity in Language Generation

pdf bib
應用文脈分析於中英夾雜語音合成系統(Linguistic Analysis for English/Mandarin Speech Synthesis System)
Yi-Hsiang Hung | Yi-Chin Huang | Guang-Feng Deng
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)

2013

pdf bib
合成單元與問題集之定義於隱藏式馬可夫模型中文歌聲合成系統之建立 (Synthesis Unit and Question Set Definition for Mandarin HMM-based Singing Voice Synthesis)
Ju-Yun Cheng | Yi-Chin Huang | Chung-Hsien Wu
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)

pdf bib
HMM-based Mandarin Singing Voice Synthesis Using Tailored Synthesis Units and Question Sets
Ju-Yun Cheng | Yi-Chin Huang | Chung-Hsien Wu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 4, December 2013-Special Issue on Selected Papers from ROCLING XXV