Zhaolin Li


2024

pdf bib
Speech Recognition Corpus of the Khinalug Language for Documenting Endangered Languages
Zhaolin Li | Monika Rind-Pawlowski | Jan Niehues
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Automatic Speech Recognition (ASR) can be a valuable tool to document endangered languages. However, building ASR tools for these languages poses several difficult research challenges, notably data scarcity. In this paper, we show the whole process of creating a useful ASR tool for language documentation scenarios. We publish the first speech corpus for Khinalug, an endangered language spoken in Northern Azerbaijan. The corpus consists of 2.67 hours of labeled data from recordings of spontaneous speech about various topics. As Khinalug is an extremely low-resource language, we investigate the benefits of multilingual models for self-supervised learning and supervised learning and achieve the performance of 6.65 Character Error Rate (CER) points and 25.53 Word Error Rate (WER) points. The benefits of multilingual models are further validated through experimentation with three additional under-resourced languages. Lastly, this work conducts quality assessments with linguists on new recordings to investigate the model’s usefulness in language documentation. We observe an evident degradation for new recordings, indicating the importance of enhancing model robustness. In addition, we find the inaudible content is the main cause of wrong ASR predictions, suggesting relating work on incorporating contextual information.

2023

pdf bib
End-to-End Evaluation for Low-Latency Simultaneous Speech Translation
Christian Huber | Tu Anh Dinh | Carlos Mullov | Ngoc-Quan Pham | Thai Binh Nguyen | Fabian Retkowski | Stefan Constantin | Enes Ugan | Danni Liu | Zhaolin Li | Sai Koneru | Jan Niehues | Alexander Waibel
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work, we propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions. The evaluation is carried out in an end-to-end fashion. This includes the segmentation of the audio as well as the run-time of the different components. Secondly, we compare different approaches to low-latency speech translation using this framework. We evaluate models with the option to revise the output as well as methods with fixed output. Furthermore, we directly compare state-of-the-art cascaded as well as end-to-end systems. Finally, the framework allows to automatically evaluate the translation quality as well as latency and also provides a web interface to show the low-latency model outputs to the user.