Jackson Ehrenworth
2023
Literary Intertextual Semantic Change Detection: Application and Motivation for Evaluating Models on Small Corpora
Jackson Ehrenworth
|
Katherine Keith
Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change
Lexical semantic change detection is the study of how words change meaning between corpora. While Schlechtweg et al. (2020) standardized both datasets and evaluation metrics for this shared task, for those interested in applying semantic change detection models to small corpora—e.g., in the digital humanities—there is a need for evaluation involving much smaller datasets. We present a method and open-source code pipeline for downsampling the SemEval-2020 Task 1 corpora while preserving gold standard measures of semantic change. We then evaluate several state-of-the-art models trained on these downsampled corpora and find both dramatically decreased performance (average 67% decrease) and high variance. We also propose a novel application to the digital humanities and provide a case study demonstrating that semantic change detection can be used in an exploratory manner to produce insightful avenues of investigation for literary scholars.
Search