Chara Tsoukala
2026
Extending ASR Evaluation Resources for Modern Greek Dialects
Chara Tsoukala | Stavros Bompolas | Antigoni Margariti | Konstantina Panagiotou | Maria Elisavet Plaiti | Nefeli Tzanakaki | Petros Karatsareas | Angela Ralli | Antonios Anastasopoulos | Stella Markantonatou
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Chara Tsoukala | Stavros Bompolas | Antigoni Margariti | Konstantina Panagiotou | Maria Elisavet Plaiti | Nefeli Tzanakaki | Petros Karatsareas | Angela Ralli | Antonios Anastasopoulos | Stella Markantonatou
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Recent progress in Automatic Speech Recognition (ASR) has primarily benefited high-resource standard languages, while dialectal speech remains challenging and underexplored. We present an expanded benchmark for low-resource Modern Greek dialects, covering Aperathiot, Cretan, Lesbian, and Cappadocian, spanning southern, northern, and contact-influenced varieties with varying degrees of divergence from Standard Modern Greek. The benchmark provides dialectal transcriptions in the Greek alphabet, following SMG-based orthographic conventions, while preserving dialectal lexical and morphophonological forms. Using this benchmark, we evaluate state-of-the-art multilingual ASR models in a zero-shot setting and by further fine-tuning per dialect. Zero-shot results reveal a clear performance gradient with dialectal distance from Standard Modern Greek, with best WERs ranging from about 60-70% for southern dialects to over 80% for Lesbian and nearly 97% for Cappadocian. Fine-tuning substantially reduces error rates (up to 47% relative WER improvement), with Cappadocian remaining the most challenging variety (best WER 68.17%). Overall, our results highlight persistent limitations of current pretrained ASR models under dialectal variation and the need for dedicated benchmarks and adaptation strategies.
2023
ASR pipeline for low-resourced languages: A case study on Pomak
Chara Tsoukala | Kosmas Kritsis | Ioannis Douros | Athanasios Katsamanis | Nikolaos Kokkas | Vasileios Arampatzakis | Vasileios Sevetlidis | Stella Markantonatou | George Pavlidis
Proceedings of the Second Workshop on NLP Applications to Field Linguistics
Chara Tsoukala | Kosmas Kritsis | Ioannis Douros | Athanasios Katsamanis | Nikolaos Kokkas | Vasileios Arampatzakis | Vasileios Sevetlidis | Stella Markantonatou | George Pavlidis
Proceedings of the Second Workshop on NLP Applications to Field Linguistics
Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.
2019
Simulating Spanish-English Code-Switching: El Modelo Está Generating Code-Switches
Chara Tsoukala | Stefan L. Frank | Antal van den Bosch | Jorge Valdés Kroff | Mirjam Broersma
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Chara Tsoukala | Stefan L. Frank | Antal van den Bosch | Jorge Valdés Kroff | Mirjam Broersma
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Multilingual speakers are able to switch from one language to the other (“code-switch”) between or within sentences. Because the underlying cognitive mechanisms are not well understood, in this study we use computational cognitive modeling to shed light on the process of code-switching. We employed the Bilingual Dual-path model, a Recurrent Neural Network of bilingual sentence production (Tsoukala et al., 2017), and simulated sentence production in simultaneous Spanish-English bilinguals. Our first goal was to investigate whether the model would code-switch without being exposed to code-switched training input. The model indeed produced code-switches even without any exposure to such input and the patterns of code-switches are in line with earlier linguistic work (Poplack,1980). The second goal of this study was to investigate an auxiliary phrase asymmetry that exists in Spanish-English code-switched production. Using this cognitive model, we examined a possible cause for this asymmetry. To our knowledge, this is the first computational cognitive model that aims to simulate code-switched sentence production.
2014
CASMACAT: A Computer-assisted Translation Workbench
Vicent Alabau | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Ulrich Germann | Jesús González-Rubio | Robin Hill | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Ortiz-Martínez | Herve Saint-Amand | Germán Sanchis Trilles | Chara Tsoukala
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
Vicent Alabau | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Ulrich Germann | Jesús González-Rubio | Robin Hill | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Ortiz-Martínez | Herve Saint-Amand | Germán Sanchis Trilles | Chara Tsoukala
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
Search
Fix author
Co-authors
- Philipp Koehn 2
- Stella Markantonatou 2
- Herve Saint-Amand 2
- Vicent Alabau 1
- Antonios Anastasopoulos 1
- Vasileios Arampatzakis 1
- Stavros Bompolas 1
- Mirjam Broersma 1
- Christian Buck 1
- Michael Carl 1
- Francisco Casacuberta 1
- Ioannis Douros 1
- Stefan L. Frank 1
- Mercedes García-Martínez 1
- Ulrich Germann 1
- Jesús González-Rubio 1
- Robin L. Hill 1
- Petros Karatsareas 1
- Athanasios Katsamanis 1
- Nikolaos Kokkas 1
- Kosmas Kritsis 1
- Luis A. Leiva 1
- Antigoni Margariti 1
- Bartolomé Mesa-Lao 1
- Daniel Ortiz-Martínez 1
- Konstantina Panagiotou 1
- George Pavlidis 1
- Maria Elisavet Plaiti 1
- Angela Ralli 1
- Germán Sanchis-Trilles 1
- Vasileios Sevetlidis 1
- Nefeli Tzanakaki 1
- Jorge Valdés Kroff 1
- Antal van den Bosch 1