Janine Siewert


2022

pdf bib
Low Saxon dialect distances at the orthographic and syntactic level
Janine Siewert | Yves Scherrer | Martijn Wieling
Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change

We compare five Low Saxon dialects from the 19th and 21st century from Germany and the Netherlands with each other as well as with modern Standard Dutch and Standard German. Our comparison is based on character n-grams on the one hand and PoS n-grams on the other and we show that these two lead to different distances. Particularly in the PoS-based distances, one can observe all of the 21st century Low Saxon dialects shifting towards the modern majority languages.

2021

pdf bib
Towards a balanced annotated Low Saxon dataset for diachronic investigation of dialectal variation
Janine Siewert | Yves Scherrer | Jörg Tiedemann
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

2020

pdf bib
LSDC - A comprehensive dataset for Low Saxon Dialect Classification
Janine Siewert | Yves Scherrer | Martijn Wieling | Jörg Tiedemann
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects

We present a new comprehensive dataset for the unstandardised West-Germanic language Low Saxon covering the last two centuries, the majority of modern dialects and various genres, which will be made openly available in connection with the final version of this paper. Since so far no such comprehensive dataset of contemporary Low Saxon exists, this provides a great contribution to NLP research on this language. We also test the use of this dataset for dialect classification by training a few baseline models comparing statistical and neural approaches. The performance of these models shows that in spite of an imbalance in the amount of data per dialect, enough features can be learned for a relatively high classification accuracy.