Andrej Žgank

Also published as: Andrej Zgank

2020

Towards Building an Automatic Transcription System for Language Documentation: Experiences from Muyu
Alexander Zahrer | Andrej Zgank | Barbara Schuppler
Proceedings of the Twelfth Language Resources and Evaluation Conference

Since at least half of the world’s 6000 plus languages will vanish during the 21st century, language documentation has become a rapidly growing field in linguistics. A fundamental challenge for language documentation is the ”transcription bottleneck”. Speech technology may deliver the decisive breakthrough for overcoming the transcription bottleneck. This paper presents first experiments from the development of ASR4LD, a new automatic speech recognition (ASR) based tool for language documentation (LD). The experiments are based on recordings from an ongoing documentation project for the endangered Muyu language in New Guinea. We compare phoneme recognition experiments with American English, Austrian German and Slovenian as source language and Muyu as target language. The Slovenian acoustic models achieve the by far best performance (43.71% PER) in comparison to 57.14% PER with American English, and 89.49% PER with Austrian German. Whereas part of the errors can be explained by phonetic variation, the recording mismatch poses a major problem. On the long term, ASR4LD will not only be an integral part of the ongoing documentation project of Muyu, but will be further developed in order to facilitate also the language documentation process of other language groups.

2016

pdf bib abs

The SI TEDx-UM speech database: a new Slovenian Spoken Language Resource
Andrej Žgank | Mirjam Sepesy Maučec | Darinka Verdonik
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a new Slovenian spoken language resource built from TEDx Talks. The speech database contains 242 talks in total duration of 54 hours. The annotation and transcription of acquired spoken material was generated automatically, applying acoustic segmentation and automatic speech recognition. The development and evaluation subset was also manually transcribed using the guidelines specified for the Slovenian GOS corpus. The manual transcriptions were used to evaluate the quality of unsupervised transcriptions. The average word error rate for the SI TEDx-UM evaluation subset was 50.7%, with out of vocabulary rate of 24% and language model perplexity of 390. The unsupervised transcriptions contain 372k tokens, where 32k of them were different.

2014

pdf bib abs

The Slovene BNSI Broadcast News database and reference speech corpus GOS: Towards the uniform guidelines for future work
Andrej Žgank | Ana Zwitter Vitez | Darinka Verdonik
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The aim of the paper is to search for common guidelines for the future development of speech databases for less resourced languages in order to make them the most useful for both main fields of their use, linguistic research and speech technologies. We compare two standards for creating speech databases, one followed when developing the Slovene speech database for automatic speech recognition ― BNSI Broadcast News, the other followed when developing the Slovene reference speech corpus GOS, and outline possible common guidelines for future work. We also present an add-on for the GOS corpus, which enables its usage for automatic speech recognition.

2006

pdf bib abs

SINOD - Slovenian non-native speech database
Andrej Žgank | Darinka Verdonik | Aleksandra Zögling Markuš | Zdravko Kačič
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents the SINOD database, which is the first Slovenian non-native speech database. It will be used to improve the performance of large vocabulary continuous speech recogniser for non-native speakers. The main quality impact is expected for acoustic models and recognisers vocabulary. The SINOD database is designed as supplement to the Slovenian BNSI Broadcast News database. The same BN recommendations were used for both databases. Two interviews with non-native Slovenian speakers were incorporated in the set. Both non-native speakers were female, whereas the journalist was Slovenian native male speaker. The transcription approach applied in the production phase is presented. Different statistics and analyses of database are given in the paper.

Andrej Žgank

2020

2016

2014

2006

2004

2002

2000

Co-authors

Venues