Three-part diachronic semantic change dataset for Russian

Andrey Kutuzov, Lidia Pivovarova


Abstract
We present a manually annotated lexical semantic change dataset for Russian: RuShiftEval. Its novelty is ensured by a single set of target words annotated for their diachronic semantic shifts across three time periods, while the previous work either used only two time periods, or different sets of target words. The paper describes the composition and annotation procedure for the dataset. In addition, it is shown how the ternary nature of RuShiftEval allows to trace specific diachronic trajectories: ‘changed at a particular time period and stable afterwards’ or ‘was changing throughout all time periods’. Based on the analysis of the submissions to the recent shared task on semantic change detection for Russian, we argue that correctly identifying such trajectories can be an interesting sub-task itself.
Anthology ID:
2021.lchange-1.2
Volume:
Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021
Month:
August
Year:
2021
Address:
Online
Editors:
Nina Tahmasebi, Adam Jatowt, Yang Xu, Simon Hengchen, Syrielle Montariol, Haim Dubossarsky
Venue:
LChange
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7–13
Language:
URL:
https://aclanthology.org/2021.lchange-1.2
DOI:
10.18653/v1/2021.lchange-1.2
Bibkey:
Cite (ACL):
Andrey Kutuzov and Lidia Pivovarova. 2021. Three-part diachronic semantic change dataset for Russian. In Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021, pages 7–13, Online. Association for Computational Linguistics.
Cite (Informal):
Three-part diachronic semantic change dataset for Russian (Kutuzov & Pivovarova, LChange 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.lchange-1.2.pdf
Data
RuShiftEval