Contextualized Diachronic Word Representations

Ganesh Jawahar, Djamé Seddah


Abstract
Diachronic word embeddings play a key role in capturing interesting patterns about how language evolves over time. Most of the existing work focuses on studying corpora spanning across several decades, which is understandably still not a possibility when working on social media-based user-generated content. In this work, we address the problem of studying semantic changes in a large Twitter corpus collected over five years, a much shorter period than what is usually the norm in diachronic studies. We devise a novel attentional model, based on Bernoulli word embeddings, that are conditioned on contextual extra-linguistic (social) features such as network, spatial and socio-economic variables, which are associated with Twitter users, as well as topic-based features. We posit that these social features provide an inductive bias that helps our model to overcome the narrow time-span regime problem. Our extensive experiments reveal that our proposed model is able to capture subtle semantic shifts without being biased towards frequency cues and also works well when certain contextual features are absent. Our model fits the data better than current state-of-the-art dynamic word embedding models and therefore is a promising tool to study diachronic semantic changes over small time periods.
Anthology ID:
W19-4705
Volume:
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu
Venue:
LChange
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35–47
Language:
URL:
https://aclanthology.org/W19-4705
DOI:
10.18653/v1/W19-4705
Bibkey:
Cite (ACL):
Ganesh Jawahar and Djamé Seddah. 2019. Contextualized Diachronic Word Representations. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 35–47, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Contextualized Diachronic Word Representations (Jawahar & Seddah, LChange 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4705.pdf
Code
 ganeshjawahar/social_word_emb