German’s Next Language Model

Branden Chan, Stefan Schweter, Timo Möller


Abstract
In this work we present the experiments which lead to the creation of our BERT and ELECTRA based German language models, GBERT and GELECTRA. By varying the input training data, model size, and the presence of Whole Word Masking (WWM) we were able to attain SoTA performance across a set of document classification and named entity recognition (NER) tasks for both models of base and large size. We adopt an evaluation driven approach in training these models and our results indicate that both adding more data and utilizing WWM improve model performance. By benchmarking against existing German models, we show that these models are the best German models to date. All trained models will be made publicly available to the research community.
Anthology ID:
2020.coling-main.598
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
6788–6796
Language:
URL:
https://aclanthology.org/2020.coling-main.598
DOI:
10.18653/v1/2020.coling-main.598
Bibkey:
Cite (ACL):
Branden Chan, Stefan Schweter, and Timo Möller. 2020. German’s Next Language Model. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6788–6796, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
German’s Next Language Model (Chan et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.598.pdf
Code
 dbmdz/berts +  additional community code