Detecting Syntactic Change with Pre-trained Transformer Models

Liwen Hou; David A. Smith

doi:10.18653/v1/2023.findings-emnlp.230

Detecting Syntactic Change with Pre-trained Transformer Models

Abstract

We investigate the ability of Transformer-based language models to find syntactic differences between the English of the early 1800s and that of the late 1900s. First, we show that a fine-tuned BERT model can distinguish between text from these two periods using syntactic information only; to show this, we employ a strategy to hide semantic information from the text. Second, we make further use of fine-tuned BERT models to identify specific instances of syntactic change and specific words for which a new part of speech was introduced. To do this, we employ an automatic part-of-speech (POS) tagger and use it to train corpora-specific taggers based only on BERT representations pretrained on different corpora. Notably, our methods of identifying specific candidates for syntactic change avoid using any automatic POS tagger on old text, where its performance may be unreliable; instead, our methods only use untagged old text together with tagged modern text. We examine samples and distributional properties of the model output to validate automatically identified cases of syntactic change. Finally, we use our techniques to confirm the historical rise of the progressive construction, a known example of syntactic change.

Anthology ID:: 2023.findings-emnlp.230
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3564–3574
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.230/
DOI:: 10.18653/v1/2023.findings-emnlp.230
Bibkey:
Cite (ACL):: Liwen Hou and David Smith. 2023. Detecting Syntactic Change with Pre-trained Transformer Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3564–3574, Singapore. Association for Computational Linguistics.
Cite (Informal):: Detecting Syntactic Change with Pre-trained Transformer Models (Hou & Smith, Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-emnlp.230.pdf

PDF Cite Search Fix data