NorDiaChange: Diachronic Semantic Change Dataset for Norwegian

Andrey Kutuzov, Samia Touileb, Petter Mæhlum, Tita Enstad, Alexandra Wittemann


Abstract
We describe NorDiaChange: the first diachronic semantic change dataset for Norwegian. NorDiaChange comprises two novel subsets, covering about 80 Norwegian nouns manually annotated with graded semantic change over time. Both datasets follow the same annotation procedure and can be used interchangeably as train and test splits for each other. NorDiaChange covers the time periods related to pre- and post-war events, oil and gas discovery in Norway, and technological developments. The annotation was done using the DURel framework and two large historical Norwegian corpora. NorDiaChange is published in full under a permissive licence, complete with raw annotation data and inferred diachronic word usage graphs (DWUGs).
Anthology ID:
2022.lrec-1.274
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2563–2572
Language:
URL:
https://aclanthology.org/2022.lrec-1.274
DOI:
Bibkey:
Cite (ACL):
Andrey Kutuzov, Samia Touileb, Petter Mæhlum, Tita Enstad, and Alexandra Wittemann. 2022. NorDiaChange: Diachronic Semantic Change Dataset for Norwegian. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2563–2572, Marseille, France. European Language Resources Association.
Cite (Informal):
NorDiaChange: Diachronic Semantic Change Dataset for Norwegian (Kutuzov et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.274.pdf
Code
 ltgoslo/nor_dia_change