Martin Ruskov


2023

pdf bib
Explicit References to Social Values in Fairy Tales: A Comparison between Three European Cultures
Alba Morollon Diaz-Faes | Carla Murteira | Martin Ruskov
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages

The study of social values in fairy tales opens the possibility to learn about the communication of values across space and time. We propose to study the communication of values in fairy tales from Portugal, Italy and Germany using a technique called word embedding with a compass to quantify vocabulary differences and commonalities. We study how these three national traditions of fairy tales differ in their explicit references to values. To do this, we specify a list of value-charged tokens, consider their word stems and analyse the distance between these in a bespoke pre-trained Word2Vec model. We triangulate and critically discuss the validity of the resulting hypotheses emerging from this quantitative model. Our claim is that this is a reusable and reproducible method for the study of the values explicitly referenced in historical corpora. Finally, our preliminary findings hint at a shared cultural understanding and the expression of values such as Benevolence, Conformity, and Universalism across European societies, suggesting the existence of a pan-European cultural memory.

2022

pdf bib
What is Done is Done: an Incremental Approach to Semantic Shift Detection
Francesco Periti | Alfio Ferrara | Stefano Montanelli | Martin Ruskov
Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change

Contextual word embedding techniques for semantic shift detection are receiving more and more attention. In this paper, we present What is Done is Done (WiDiD), an incremental approach to semantic shift detection based on incremental clustering techniques and contextual embedding methods to capture the changes over the meanings of a target word along a diachronic corpus. In WiDiD, the word contexts observed in the past are consolidated as a set of clusters that constitute the “memory” of the word meanings observed so far. Such a memory is exploited as a basis for subsequent word observations, so that the meanings observed in the present are stratified over the past ones.