Paola Leibny Garcia


2024

pdf bib
Speech Data from Radio Broadcasts for Low Resource Languages
Bismarck Bamfo Odoom | Paola Leibny Garcia | Prangthip Hansanti | Loïc Barrault | Christophe Ropers | Matthew Wiesner | Kenton Murray | Alex Mourachko | Philipp Koehn
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

We created a collection of speech data for 48 low resource languages. The corpus is extracted from radio broadcasts and processed with novel speech detection and language identification models based on a manually vetted subset of the audio for 10 languages. The data is made publicly available.