Paola Leibny Garcia
2024
Speech Data from Radio Broadcasts for Low Resource Languages
Bismarck Bamfo Odoom
|
Paola Leibny Garcia
|
Prangthip Hansanti
|
Loïc Barrault
|
Christophe Ropers
|
Matthew Wiesner
|
Kenton Murray
|
Alex Mourachko
|
Philipp Koehn
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
We created a collection of speech data for 48 low resource languages. The corpus is extracted from radio broadcasts and processed with novel speech detection and language identification models based on a manually vetted subset of the audio for 10 languages. The data is made publicly available.
Search
Fix author
Co-authors
- Bismarck Bamfo Odoom 1
- Loïc Barrault 1
- Prangthip Hansanti 1
- Philipp Koehn 1
- Alex Mourachko 1
- show all...