Short-term Semantic Shifts and their Relation to Frequency Change
Anna Marakasova | Julia Neidhardt
Proceedings of the Probability and Meaning Conference (PaM 2020)
We present ongoing research on the relationship between short-term semantic shifts and frequency change patterns by examining the case of the refugee crisis in Austria from 2015 to 2016. Our experiments are carried out on a diachronic corpus of Austrian German, namely a corpus of newspaper articles. We trace the evolution of the usage of words that represent concepts in the context of the refugee crisis by analyzing cosine similarities of word vectors over time as well as similarities based on the words’ nearest neighbourhood sets. In order to investigate how exactly the contextual meanings have changed, we measure cosine similarity between the following pairs of words: words describing the refugee crisis, on the one hand, and words indicating the process of mediatization and politicization of the refugee crisis in Austria proposed by a domain expert, on the other hand. We evaluate our approach against the expert knowledge. The paper presents the current findings and outlines the directions of the future work.
Comparing Lexical Usage in Political Discourse across Diachronic Corpora
Klaus Hofmann | Anna Marakasova | Andreas Baumann | Julia Neidhardt | Tanja Wissik
Proceedings of the Second ParlaCLARIN Workshop
Most diachronic studies on both lexico-semantic change and political language usage are based on individual or comparable corpora. In this paper, we explore ways of studying the stability (and changeability) of lexical usage in political discourse across two corpora which are substantially different in structure and size. We present a case study focusing on lexical items associated with political parties in two diachronic corpora of Austrian German, namely a diachronic media corpus (AMC) and a corpus of parliamentary records (ParlAT), and measure the cross-temporal stability of lexical usage over a period of 20 years. We conduct three sets of comparative analyses investigating a) the stability of sets of lexical items associated with the three major political parties over time, b) lexical similarity between parties, and c) the similarity between the lexical choices in parliamentary speeches by members of the parties vis-‘a-vis the media’s reporting on the parties. We employ time series modeling using generalized additive models (GAMs) to compare the lexical similarities and differences between parties within and across corpora. The results show that changes observed in these measures can be meaningfully related to political events during that time.
Exploration of register-dependent lexical semantics using word embeddings
Andrey Kutuzov | Elizaveta Kuzmenko | Anna Marakasova
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)
We present an approach to detect differences in lexical semantics across English language registers, using word embedding models from distributional semantics paradigm. Models trained on register-specific subcorpora of the BNC corpus are employed to compare lists of nearest associates for particular words and draw conclusions about their semantic shifts depending on register in which they are used. The models are evaluated on the task of register classification with the help of the deep inverse regression approach. Additionally, we present a demo web service featuring most of the described models and allowing to explore word meanings in different English registers and to detect register affiliation for arbitrary texts. The code for the service can be easily adapted to any set of underlying models.
- Julia Neidhardt 2
- Klaus Hofmann 1
- Andreas Baumann 1
- Tanja Wissik 1
- Andrey Kutuzov 1
- show all...