How to age BERT Well: Continuous Training for Historical Language Adaptation

Anika Harju; Rob Van Der Goot

How to age BERT Well: Continuous Training for Historical Language Adaptation

Abstract

As the application of computational tools increases to digitalize historical archives, automatic annotation challenges persist due to distinct linguistic and morphological features of historical languages like Old English (OE). Existing tools struggle with the historical language varieties due to insufficient training. Previous research has focused on adapting pre-trained language models to new languages or domains but has rarely explored the modeling of language variety across time. Hence, we investigate the effectiveness of continuous language model training for adapting language models to OE on domain-specific data. We compare the continuous training of an English model (EN) and a multilingual model (ML), and use POS tagging for downstream evaluation. Results show that continuous pre-training substantially improves performance. We retrain a modern English (EN) model and a Multi-lingual (ML) BERT model for OE. We confirmed the effectiveness of continuous pre-training for language adaptation and downstream evaluation utilizing part-of-speech (POS) tagging, advancing the potential to understand the unique grammatical structures of historical OE archives. More concretely, EN BERT initially outperformed ML BERT with an accuracy of 83% during the language modeling phase. However, on the POS tagging task, ML BERT surpassed EN BERT, achieving an accuracy of 94%, which suggests effective performance to the historical language varieties.

Anthology ID:: 2025.loreslm-1.21
Volume:: Proceedings of the First Workshop on Language Models for Low-Resource Languages
Month:: January
Year:: 2025
Address:: Abu Dhabi, United Arab Emirates
Editors:: Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venues:: LoResLM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 258–267
Language:
URL:: https://aclanthology.org/2025.loreslm-1.21/
DOI:
Bibkey:
Cite (ACL):: Anika Harju and Rob van der Goot. 2025. How to age BERT Well: Continuous Training for Historical Language Adaptation. In Proceedings of the First Workshop on Language Models for Low-Resource Languages, pages 258–267, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: How to age BERT Well: Continuous Training for Historical Language Adaptation (Harju & van der Goot, LoResLM 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.loreslm-1.21.pdf

PDF Cite Search Fix data