Athanasios Katsamanis
2023
ASR pipeline for low-resourced languages: A case study on Pomak
Chara Tsoukala
|
Kosmas Kritsis
|
Ioannis Douros
|
Athanasios Katsamanis
|
Nikolaos Kokkas
|
Vasileios Arampatzakis
|
Vasileios Sevetlidis
|
Stella Markantonatou
|
George Pavlidis
Proceedings of the Second Workshop on NLP Applications to Field Linguistics
Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.
2012
The Twins Corpus of Museum Visitor Questions
Priti Aggarwal
|
Ron Artstein
|
Jillian Gerten
|
Athanasios Katsamanis
|
Shrikanth Narayanan
|
Angela Nazarian
|
David Traum
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The Twins corpus is a collection of utterances spoken in interactions with two virtual characters who serve as guides at the Museum of Science in Boston. The corpus contains about 200,000 spoken utterances from museum visitors (primarily children) as well as from trained handlers who work at the museum. In addition to speech recordings, the corpus contains the outputs of speech recognition performed at the time of utterance as well as the system interpretation of the utterances. Parts of the corpus have been manually transcribed and annotated for question interpretation. The corpus has been used for improving performance of the museum characters and for a variety of research projects, such as phonetic-based Natural Language Understanding, creation of conversational characters from text resources, dialogue policy learning, and research on patterns of user interaction. It has the potential to be used for research on children's speech and on language used when talking to a virtual human.