Yuki Saito


2020

pdf bib
DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus
Yuki Yamashita | Tomoki Koriyama | Yuki Saito | Shinnosuke Takamichi | Yusuke Ijima | Ryo Masumura | Hiroshi Saruwatari
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we investigate the effectiveness of using rich annotations in deep neural network (DNN)-based statistical speech synthesis. DNN-based frameworks typically use linguistic information as input features called context instead of directly using text. In such frameworks, we can synthesize not only reading-style speech but also speech with paralinguistic and nonlinguistic features by adding such information to the context. However, it is not clear what kind of information is crucial for reproducing paralinguistic and nonlinguistic features. Therefore, we investigate the effectiveness of rich tags in DNN-based speech synthesis according to the Corpus of Spontaneous Japanese (CSJ), which has a large amount of annotations on paralinguistic features such as prosody, disfluency, and morphological features. Experimental evaluation results shows that the reproducibility of paralinguistic features of synthetic speech was enhanced by adding such information as context.

pdf bib
SMASH Corpus: A Spontaneous Speech Corpus Recording Third-person Audio Commentaries on Gameplay
Yuki Saito | Shinnosuke Takamichi | Hiroshi Saruwatari
Proceedings of the Twelfth Language Resources and Evaluation Conference

Developing a spontaneous speech corpus would be beneficial for spoken language processing and understanding. We present a speech corpus named the SMASH corpus, which includes spontaneous speech of two Japanese male commentators that made third-person audio commentaries during the gameplay of a fighting game. Each commentator ad-libbed while watching the gameplay with various topics covering not only explanations of each moment to convey the information on the fight but also comments to entertain listeners. We made transcriptions and topic tags as annotations on the recorded commentaries with our two-step method. We first made automatic and manual transcriptions of the commentaries and then manually annotated the topic tags. This paper describes how we constructed the SMASH corpus and reports some results of the annotations.