ASR pipeline for low-resourced languages: A case study on Pomak

Chara Tsoukala, Kosmas Kritsis, Ioannis Douros, Athanasios Katsamanis, Nikolaos Kokkas, Vasileios Arampatzakis, Vasileios Sevetlidis, Stella Markantonatou, George Pavlidis


Abstract
Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.
Anthology ID:
2023.fieldmatters-1.5
Volume:
Proceedings of the Second Workshop on NLP Applications to Field Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Oleg Serikov, Ekaterina Voloshina, Anna Postnikova, Elena Klyachko, Ekaterina Vylomova, Tatiana Shavrina, Eric Le Ferrand, Valentin Malykh, Francis Tyers, Timofey Arkhangelskiy, Vladislav Mikhailov
Venue:
FieldMatters
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
40–45
Language:
URL:
https://aclanthology.org/2023.fieldmatters-1.5
DOI:
10.18653/v1/2023.fieldmatters-1.5
Bibkey:
Cite (ACL):
Chara Tsoukala, Kosmas Kritsis, Ioannis Douros, Athanasios Katsamanis, Nikolaos Kokkas, Vasileios Arampatzakis, Vasileios Sevetlidis, Stella Markantonatou, and George Pavlidis. 2023. ASR pipeline for low-resourced languages: A case study on Pomak. In Proceedings of the Second Workshop on NLP Applications to Field Linguistics, pages 40–45, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
ASR pipeline for low-resourced languages: A case study on Pomak (Tsoukala et al., FieldMatters 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.fieldmatters-1.5.pdf
Video:
 https://aclanthology.org/2023.fieldmatters-1.5.mp4