Pretraining Language Models for Diachronic Linguistic Change Discovery

Elisabeth Fittschen; Sabrina Xin Li; Tom Lippincott; Leshem Choshen; Craig Messner

Pretraining Language Models for Diachronic Linguistic Change Discovery

Elisabeth Fittschen, Sabrina Xin Li, Tom Lippincott, Leshem Choshen, Craig Messner

Abstract

Large language models (LLMs) are increasingly used as knowledge discovery tools. Humanistic disciplines like historical linguistics and literary studies have shown interest in this capability. These fields often construct arguments on the basis of distinctions between phenomena like time-period or genre. Such methodological investments complicate reliance on LLMs pretrained over large sets of broadly-collected data. We show that efficient pretraining techniques produce useful models of semantic change over modest historical corpora without allowing potential contamination from anachronistic data. We verify that these trained-from-scratch models better respect historical divisions and are more computationally efficient compared to the standard approach of fine-tuning an existing LLM. We compare the trade-offs in general linguistic fluency versus detecting and characterizing various forms of linguistic change, and provide a pipeline implementation of our approach that can be readily adapted and applied to a wide range of diachronic phenomena.

Anthology ID:: 2026.findings-eacl.241
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4627–4642
Language:
URL:: https://aclanthology.org/2026.findings-eacl.241/
DOI:
Bibkey:
Cite (ACL):: Elisabeth Fittschen, Sabrina Xin Li, Tom Lippincott, Leshem Choshen, and Craig Messner. 2026. Pretraining Language Models for Diachronic Linguistic Change Discovery. In Findings of the Association for Computational Linguistics: EACL 2026, pages 4627–4642, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Pretraining Language Models for Diachronic Linguistic Change Discovery (Fittschen et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-eacl.241.pdf
Checklist:: 2026.findings-eacl.241.checklist.pdf

PDF Cite Search Checklist Fix data