Investigating Diatopic Variation in a Historical Corpus

Stefanie Dipper, Sandra Waldenberger


Abstract
This paper investigates diatopic variation in a historical corpus of German. Based on equivalent word forms from different language areas, replacement rules and mappings are derived which describe the relations between these word forms. These rules and mappings are then interpreted as reflections of morphological, phonological or graphemic variation. Based on sample rules and mappings, we show that our approach can replicate results from historical linguistics. While previous studies were restricted to predefined word lists, or confined to single authors or texts, our approach uses a much wider range of data available in historical corpora.
Anthology ID:
W17-1204
Volume:
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Preslav Nakov, Marcos Zampieri, Nikola Ljubešić, Jörg Tiedemann, Shevin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
36–45
Language:
URL:
https://aclanthology.org/W17-1204
DOI:
10.18653/v1/W17-1204
Bibkey:
Cite (ACL):
Stefanie Dipper and Sandra Waldenberger. 2017. Investigating Diatopic Variation in a Historical Corpus. In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), pages 36–45, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Investigating Diatopic Variation in a Historical Corpus (Dipper & Waldenberger, VarDial 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-1204.pdf