Detecting Syntactic Change Using a Neural Part-of-Speech Tagger

William Merrill, Gigi Stark, Robert Frank


Abstract
We train a diachronic long short-term memory (LSTM) part-of-speech tagger on a large corpus of American English from the 19th, 20th, and 21st centuries. We analyze the tagger’s ability to implicitly learn temporal structure between years, and the extent to which this knowledge can be transferred to date new sentences. The learned year embeddings show a strong linear correlation between their first principal component and time. We show that temporal information encoded in the model can be used to predict novel sentences’ years of composition relatively well. Comparisons to a feedforward baseline suggest that the temporal change learned by the LSTM is syntactic rather than purely lexical. Thus, our results suggest that our tagger is implicitly learning to model syntactic change in American English over the course of the 19th, 20th, and early 21st centuries.
Anthology ID:
W19-4721
Volume:
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu
Venue:
LChange
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
167–174
Language:
URL:
https://aclanthology.org/W19-4721
DOI:
10.18653/v1/W19-4721
Bibkey:
Cite (ACL):
William Merrill, Gigi Stark, and Robert Frank. 2019. Detecting Syntactic Change Using a Neural Part-of-Speech Tagger. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 167–174, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Detecting Syntactic Change Using a Neural Part-of-Speech Tagger (Merrill et al., LChange 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4721.pdf
Code
 viking-sudo-rm/DiachronicPOSTagger