Cross-Corpus Data Augmentation for Acoustic Addressee Detection

Oleg Akhtiamov, Ingo Siegert, Alexey Karpov, Wolfgang Minker


Abstract
Acoustic addressee detection (AD) is a modern paralinguistic and dialogue challenge that especially arises in voice assistants. In the present study, we distinguish addressees in two settings (a conversation between several people and a spoken dialogue system, and a conversation between several adults and a child) and introduce the first competitive baseline (unweighted average recall equals 0.891) for the Voice Assistant Conversation Corpus that models the first setting. We jointly solve both classification problems, using three models: a linear support vector machine dealing with acoustic functionals and two neural networks utilising raw waveforms alongside with acoustic low-level descriptors. We investigate how different corpora influence each other, applying the mixup approach to data augmentation. We also study the influence of various acoustic context lengths on AD. Two-second speech fragments turn out to be sufficient for reliable AD. Mixup is shown to be beneficial for merging acoustic data (extracted features but not raw waveforms) from different domains that allows us to reach a higher classification performance on human-machine AD and also for training a multipurpose neural network that is capable of solving both human-machine and adult-child AD problems.
Anthology ID:
W19-5933
Volume:
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
Month:
September
Year:
2019
Address:
Stockholm, Sweden
Editors:
Satoshi Nakamura, Milica Gasic, Ingrid Zukerman, Gabriel Skantze, Mikio Nakano, Alexandros Papangelis, Stefan Ultes, Koichiro Yoshino
Venue:
SIGDIAL
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
274–283
Language:
URL:
https://aclanthology.org/W19-5933/
DOI:
10.18653/v1/W19-5933
Bibkey:
Cite (ACL):
Oleg Akhtiamov, Ingo Siegert, Alexey Karpov, and Wolfgang Minker. 2019. Cross-Corpus Data Augmentation for Acoustic Addressee Detection. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, pages 274–283, Stockholm, Sweden. Association for Computational Linguistics.
Cite (Informal):
Cross-Corpus Data Augmentation for Acoustic Addressee Detection (Akhtiamov et al., SIGDIAL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5933.pdf