Producing Standard German Subtitles for Swiss German TV Content

Johanna Gerlach, Jonathan Mutal, Bouillon Pierrette


Abstract
In this study we compare two approaches (neural machine translation and edit-based) and the use of synthetic data for the task of translating normalised Swiss German ASR output into correct written Standard German for subtitles, with a special focus on syntactic differences. Results suggest that NMT is better suited to this task and that relatively simple rule-based generation of training data could be a valuable approach for cases where little training data is available and transformations are simple.
Anthology ID:
2022.slpat-1.5
Volume:
Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Sarah Ebling, Emily Prud’hommeaux, Preethi Vaidyanathan
Venue:
SLPAT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
37–43
Language:
URL:
https://aclanthology.org/2022.slpat-1.5
DOI:
10.18653/v1/2022.slpat-1.5
Bibkey:
Cite (ACL):
Johanna Gerlach, Jonathan Mutal, and Bouillon Pierrette. 2022. Producing Standard German Subtitles for Swiss German TV Content. In Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022), pages 37–43, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Producing Standard German Subtitles for Swiss German TV Content (Gerlach et al., SLPAT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.slpat-1.5.pdf
Video:
 https://aclanthology.org/2022.slpat-1.5.mp4