A Large Scale Speech Sentiment Corpus

Eric Chen, Zhiyun Lu, Hao Xu, Liangliang Cao, Yu Zhang, James Fan


Abstract
We present a multimodal corpus for sentiment analysis based on the existing Switchboard-1 Telephone Speech Corpus released by the Linguistic Data Consortium. This corpus extends the Switchboard-1 Telephone Speech Corpus by adding sentiment labels from 3 different human annotators for every transcript segment. Each sentiment label can be one of three options: positive, negative, and neutral. Annotators are recruited using Google Cloud’s data labeling service and the labeling task was conducted over the internet. The corpus contains a total of 49500 labeled speech segments covering 140 hours of audio. To the best of our knowledge, this is the largest multimodal Corpus for sentiment analysis that includes both speech and text features.
Anthology ID:
2020.lrec-1.806
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6549–6555
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.806
DOI:
Bibkey:
Cite (ACL):
Eric Chen, Zhiyun Lu, Hao Xu, Liangliang Cao, Yu Zhang, and James Fan. 2020. A Large Scale Speech Sentiment Corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6549–6555, Marseille, France. European Language Resources Association.
Cite (Informal):
A Large Scale Speech Sentiment Corpus (Chen et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.806.pdf