Liwen Hou
2023
Detecting Syntactic Change with Pre-trained Transformer Models
Liwen Hou
|
David Smith
Findings of the Association for Computational Linguistics: EMNLP 2023
We investigate the ability of Transformer-based language models to find syntactic differences between the English of the early 1800s and that of the late 1900s. First, we show that a fine-tuned BERT model can distinguish between text from these two periods using syntactic information only; to show this, we employ a strategy to hide semantic information from the text. Second, we make further use of fine-tuned BERT models to identify specific instances of syntactic change and specific words for which a new part of speech was introduced. To do this, we employ an automatic part-of-speech (POS) tagger and use it to train corpora-specific taggers based only on BERT representations pretrained on different corpora. Notably, our methods of identifying specific candidates for syntactic change avoid using any automatic POS tagger on old text, where its performance may be unreliable; instead, our methods only use untagged old text together with tagged modern text. We examine samples and distributional properties of the model output to validate automatically identified cases of syntactic change. Finally, we use our techniques to confirm the historical rise of the progressive construction, a known example of syntactic change.
2021
Drivers of English Syntactic Change in the Canadian Parliament
Liwen Hou
|
David A. Smith
Proceedings of the Society for Computation in Linguistics 2021
Emerging English Transitives over the Last Two Centuries
Liwen Hou
|
David A. Smith
Proceedings of the Society for Computation in Linguistics 2021
2018
Modeling the Decline in English Passivization
Liwen Hou
|
David Smith
Proceedings of the Society for Computation in Linguistics (SCiL) 2018