Shenghui Wang


2022

pdf bib
Visualisation Methods for Diachronic Semantic Shift
Raef Kazi | Alessandra Amato | Shenghui Wang | Doina Bucur
Proceedings of the Third Workshop on Scholarly Document Processing

The meaning and usage of a concept or a word changes over time. These diachronic semantic shifts reflect the change of societal and cultural consensus as well as the evolution of science. The availability of large-scale corpora and recent success in language models have enabled researchers to analyse semantic shifts in great detail. However, current research lacks intuitive ways of presenting diachronic semantic shifts and making them comprehensive. In this paper, we study the PubMed dataset and compute semantic shifts across six decades. We develop three visualisation methods that can show, given a root word: the temporal change in its linguistic context, word re-occurrence, degree of similarity, time continuity, and separate trends per publisher location. We also propose a taxonomy that classifies visualisation methods for diachronic semantic shifts with respect to different purposes.

2019

pdf bib
Fast and Discriminative Semantic Embedding
Rob Koopman | Shenghui Wang | Gwenn Englebienne
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

The embedding of words and documents in compact, semantically meaningful vector spaces is a crucial part of modern information systems. Deep Learning models are powerful but their hyperparameter selection is often complex and they are expensive to train, and while pre-trained models are available, embeddings trained on general corpora are not necessarily well-suited to domain specific tasks. We propose a novel embedding method which extends random projection by weighting and projecting raw term embeddings orthogonally to an average language vector, thus improving the discriminating power of resulting term embeddings, and build more meaningful document embeddings by assigning appropriate weights to individual terms. We describe how updating the term embeddings online as we process the training data results in an extremely efficient method, in terms of both computational and memory requirements. Our experiments show highly competitive results with various state-of-the-art embedding methods on different tasks, including the standard STS benchmark and a subject prediction task, at a fraction of the computational cost.