Development of the Siberian Ingrian Finnish Speech Corpus

Ivan Ubaleht, Taisto-Kalevi Raudalainen


Abstract
In this paper we present the speech corpus for the Siberian Ingrian Finnish language. The speech corpus includes audio data, annotations, software tools for data-processing, two databases and a web application. We have published part of the audio data and annotations. The software tool for parsing annotation files and feeding a relational database is developed and published under a free license. A web application is developed and available. At this moment, about 300 words and 200 phrases can be displayed using this web application.
Anthology ID:
2022.computel-1.1
Volume:
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Sarah Moeller, Antonios Anastasopoulos, Antti Arppe, Aditi Chaudhary, Atticus Harrigan, Josh Holden, Jordan Lachler, Alexis Palmer, Shruti Rijhwani, Lane Schwartz
Venue:
ComputEL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–4
Language:
URL:
https://aclanthology.org/2022.computel-1.1
DOI:
10.18653/v1/2022.computel-1.1
Bibkey:
Cite (ACL):
Ivan Ubaleht and Taisto-Kalevi Raudalainen. 2022. Development of the Siberian Ingrian Finnish Speech Corpus. In Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 1–4, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Development of the Siberian Ingrian Finnish Speech Corpus (Ubaleht & Raudalainen, ComputEL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.computel-1.1.pdf
Video:
 https://aclanthology.org/2022.computel-1.1.mp4