Study on Automatic Punctuation Restoration in Bilingual Broadcast Stream

Martin Polacek


Abstract
In this study, we employ various ELECTRA-Small models that are pre-trained and fine-tuned on specific sets of languages for automatic punctuation restoration (APR) in automatically transcribed TV and radio shows, which contain conversations in two closely related languages. Our evaluation data specifically concerns bilingual interviews in Czech and Slovak and data containing speeches in Swedish and Norwegian. We train and evaluate three types of models: the multilingual (mELECTRA) model, which is pre-trained for 13 European languages; two bilingual models, each pre-trained for one language pair; and four monolingual models, each pre-trained for a single language. Our experimental results show that a) fine-tuning, which must be performed using data belonging to both target languages, is the key step in developing a bilingual APR system and b) the mELECTRA model yields competitive results, making it a viable option for bilingual APR and other multilingual applications. Thus, we publicly release our pre-trained bilingual and, in particular, multilingual ELECTRA-small models on HuggingFace, fostering further research in various multilingual tasks.
Anthology ID:
2025.ranlp-stud.5
Volume:
Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Boris Velichkov, Ivelina Nikolova-Koleva, Milena Slavcheva
Venues:
RANLP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
37–43
Language:
URL:
https://aclanthology.org/2025.ranlp-stud.5/
DOI:
Bibkey:
Cite (ACL):
Martin Polacek. 2025. Study on Automatic Punctuation Restoration in Bilingual Broadcast Stream. In Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing, pages 37–43, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Study on Automatic Punctuation Restoration in Bilingual Broadcast Stream (Polacek, RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-stud.5.pdf