RUSAVIC Corpus: Russian Audio-Visual Speech in Cars

Denis Ivanko, Alexandr Axyonov, Dmitry Ryumin, Alexey Kashevnik, Alexey Karpov


Abstract
We present a new audio-visual speech corpus (RUSAVIC) recorded in a car environment and designed for noise-robust speech recognition. Our goal was to produce a speech corpus which is natural (recorded in real driving conditions), controlled (providing different SNR levels by windows open/closed, moving/parked vehicle, etc.), and adequate size (the amount of data is enough to train state-of-the-art NN approaches). We focus on the problem of audio-visual speech recognition: with the use of automated lip-reading to improve the performance of audio-based speech recognition in the presence of severe acoustic noise caused by road traffic. We also describe the equipment and procedures used to create RUSAVIC corpus. Data are collected in a synchronous way through several smartphones located at different angles and equipped with FullHD video camera and microphone. The corpus includes the recordings of 20 drivers with minimum of 10 recording sessions for each. Besides providing a detailed description of the dataset and its collection pipeline, we evaluate several popular audio and visual speech recognition methods and present a set of baseline recognition results. At the moment RUSAVIC is a unique audio-visual corpus for the Russian language that is recorded in-the-wild condition and we make it publicly available.
Anthology ID:
2022.lrec-1.166
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1555–1559
Language:
URL:
https://aclanthology.org/2022.lrec-1.166
DOI:
Bibkey:
Cite (ACL):
Denis Ivanko, Alexandr Axyonov, Dmitry Ryumin, Alexey Kashevnik, and Alexey Karpov. 2022. RUSAVIC Corpus: Russian Audio-Visual Speech in Cars. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1555–1559, Marseille, France. European Language Resources Association.
Cite (Informal):
RUSAVIC Corpus: Russian Audio-Visual Speech in Cars (Ivanko et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.166.pdf