SMASH Corpus: A Spontaneous Speech Corpus Recording Third-person Audio Commentaries on Gameplay

Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari


Abstract
Developing a spontaneous speech corpus would be beneficial for spoken language processing and understanding. We present a speech corpus named the SMASH corpus, which includes spontaneous speech of two Japanese male commentators that made third-person audio commentaries during the gameplay of a fighting game. Each commentator ad-libbed while watching the gameplay with various topics covering not only explanations of each moment to convey the information on the fight but also comments to entertain listeners. We made transcriptions and topic tags as annotations on the recorded commentaries with our two-step method. We first made automatic and manual transcriptions of the commentaries and then manually annotated the topic tags. This paper describes how we constructed the SMASH corpus and reports some results of the annotations.
Anthology ID:
2020.lrec-1.809
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6571–6577
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.809
DOI:
Bibkey:
Cite (ACL):
Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari. 2020. SMASH Corpus: A Spontaneous Speech Corpus Recording Third-person Audio Commentaries on Gameplay. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 6571–6577, Marseille, France. European Language Resources Association.
Cite (Informal):
SMASH Corpus: A Spontaneous Speech Corpus Recording Third-person Audio Commentaries on Gameplay (Saito et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.809.pdf