Leibny Paola Garcia Perera
Also published as: Leibny Paola Garcia Perera
2024
Where are you from? Geolocating Speech and Applications to Language Identification
Patrick Foley
|
Matthew Wiesner
|
Bismarck Odoom
|
Leibny Paola Garcia Perera
|
Kenton Murray
|
Philipp Koehn
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
We train models to answer the question, Where are you from? and show how such models can be repurposed for language identification (LID). To our knowledge, this paper is the first to introduce data sources, methods and models to tackle the task of geolocation of speech at a global scale, and the first to explore using geolocation as a proxy-task for LID. Specifically, we explore whether radio broadcasts with known origin can be used to train regression and classification-based models for geolocating speech. We build models on top of self-supervised pretrained models, using attention pooling to qualitatively verify that the model geolocates the speech itself, and not other channel artifacts.The best geolocation models localize speaker origin to around 650km. We confirm the value of speech geolocation as a proxy task by using speech geolocation models for zero-shot LID. Finally, we show that fine-tuning geolocation models for LID outperforms fine-tuning pretrained Wav2Vec2.0 models, and achieves state-of-the-art performance on the FLEURS benchmark.
Speech Data from Radio Broadcasts for Low Resource Languages
Bismarck Bamfo Odoom
|
Leibny Paola Garcia Perera
|
Prangthip Hansanti
|
Loic Barrault
|
Christophe Ropers
|
Matthew Wiesner
|
Kenton Murray
|
Alexandre Mourachko
|
Philipp Koehn
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
We created a collection of speech data for 48 low resource languages. The corpus is extracted from radio broadcasts and processed with novel speech detection and language identification models based on a manually vetted subset of the audio for 10 languages. The data is made publicly available.
Search
Co-authors
- Matthew Wiesner 2
- Kenton Murray 2
- Philipp Koehn 2
- Patrick Foley 1
- Bismarck Odoom 1
- show all...