Zero-Shot Transfer of Pretrained Speech Representations for Multilingual Emotion Recognition

Tomasz Kuczyński


Abstract
Speech emotion recognition remains a challenging task, particularly in low-resource language settings. In this work, we explore the development of a system capable of identifying emotional states in Polish speech using training data exclusively from other languages. Our approach relies on a pretrained speech representation model and follows a strict zero-shot training paradigm, enabling cross-lingual knowledge transfer without access to any Polish data. The system was developed in the context of the Polish Speech Emotion Recognition Challenge (PolEval 2025), which required participants to train models solely on multilingual resources and evaluate them on Polish speech in a zero-shot setup. We present a complete solution encompassing model selection, audio preprocessing, and fine-tuning strategy, and discuss the potential of large-scale language models for cross-lingual emotion recognition.
Anthology ID:
2025.poleval-main.13
Volume:
Proceedings of the PolEval 2025 Workshop
Month:
November
Year:
2025
Address:
Warsaw
Editors:
Łukasz Kobyliński, Alina Wróblewska, Maciej Ogrodniczuk
Venues:
PolEval | WS
SIG:
Publisher:
Institute of Computer Science PAS and Association for Computational Linguistics
Note:
Pages:
91–96
Language:
URL:
https://aclanthology.org/2025.poleval-main.13/
DOI:
Bibkey:
Cite (ACL):
Tomasz Kuczyński. 2025. Zero-Shot Transfer of Pretrained Speech Representations for Multilingual Emotion Recognition. In Proceedings of the PolEval 2025 Workshop, pages 91–96, Warsaw. Institute of Computer Science PAS and Association for Computational Linguistics.
Cite (Informal):
Zero-Shot Transfer of Pretrained Speech Representations for Multilingual Emotion Recognition (Kuczyński, PolEval 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.poleval-main.13.pdf