Erik Tjong Kim Sang

Also published as: Erik F. Tjong Kim Sang


2020

Public sentiment (the opinion, attitude or feeling that the public expresses) is a factor of interest for government, as it directly influences the implementation of policies. Given the unprecedented nature of the COVID-19 crisis, having an up-to-date representation of public sentiment on governmental measures and announcements is crucial. In this paper, we analyse Dutch public sentiment on governmental COVID-19 measures from text data collected across three online media sources (Twitter, Reddit and Nu.nl) from February to September 2020. We apply sentiment analysis methods to analyse polarity over time, as well as to identify stance towards two specific pandemic policies regarding social distancing and wearing face masks. The presented preliminary results provide valuable insights into the narratives shown in vast social media text data, which help understand the influence of COVID-19 measures on the general public.

2016

The Nederlab project aims to bring together all digitized texts relevant to the Dutch national heritage, the history of the Dutch language and culture (circa 800 – present) in one user friendly and tool enriched open access web interface. This paper describes Nederlab halfway through the project period and discusses the collections incorporated, back-office processes, system back-end as well as the Nederlab Research Portal end-user web application.
We examine two different methods for finding rising words (among which neologisms) and falling words (among which archaisms) in decades of magazine texts (millions of words) and in years of tweets (billions of words): one based on correlation coefficients of relative frequencies and time, and one based on comparing initial and final word frequencies of time intervals. We find that smoothing frequency scores improves the precision scores of both methods and that the correlation coefficients perform better on magazine text but worse on tweets. Since the two ranking methods find different words they can be used in side-by-side to study the behavior of words over time.

2012

2010

In this paper we describe GikiCLEF, the first evaluation contest that, to our knowledge, was specifically designed to expose and investigate cultural and linguistic issues involved in structured multimedia collections and searching, and which was organized under the scope of CLEF 2009. GikiCLEF evaluated systems that answered hard questions for both human and machine, in ten different Wikipedia collections, namely Bulgarian, Dutch, English, German, Italian, Norwegian (Bokmäl and Nynorsk), Portuguese, Romanian, and Spanish. After a short historical introduction, we present the task, together with its motivation, and discuss how the topics were chosen. Then we provide another description from the point of view of the participants. Before disclosing their results, we introduce the SIGA management system explaining the several tasks which were carried out behind the scenes. We quantify in turn the GIRA resource, offered to the community for training and further evaluating systems with the help of the 50 topics gathered and the solutions identified. We end the paper with a critical discussion of what was learned, advancing possible ways to reuse the data.

2009

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998