The DIRHA Portuguese Corpus: A Comparison of Home Automation Command Detection and Recognition in Simulated and Real Data.

Miguel Matos, Alberto Abad, António Serralheiro


Abstract
In this paper, we describe a new corpus -named DIRHA-L2F RealCorpus- composed of typical home automation speech interactions in European Portuguese that has been recorded by the INESC-ID’s Spoken Language Systems Laboratory (L2F) to support the activities of the Distant-speech Interaction for Robust Home Applications (DIRHA) EU-funded project. The corpus is a multi-microphone and multi-room database of real continuous audio sequences containing read phonetically rich sentences, read and spontaneous keyword activation sentences, and read and spontaneous home automation commands. The background noise conditions are controlled and randomly recreated with noises typically found in home environments. Experimental validation on this corpus is reported in comparison with the results obtained on a simulated corpus using a fully automated speech processing pipeline for two fundamental automatic speech recognition tasks of typical ‘always-listening’ home-automation scenarios: system activation and voice command recognition. Attending to results on both corpora, the presence of overlapping voice-like noise is shown as the main problem: simulated sequences contain concurrent speakers that result in general in a more challenging corpus, while real sequences performance drops drastically when TV or radio is on.
Anthology ID:
L16-1633
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4012–4018
Language:
URL:
https://aclanthology.org/L16-1633
DOI:
Bibkey:
Cite (ACL):
Miguel Matos, Alberto Abad, and António Serralheiro. 2016. The DIRHA Portuguese Corpus: A Comparison of Home Automation Command Detection and Recognition in Simulated and Real Data.. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4012–4018, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
The DIRHA Portuguese Corpus: A Comparison of Home Automation Command Detection and Recognition in Simulated and Real Data. (Matos et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1633.pdf