Huanyu Liu


2020

pdf bib
TWEETSUM: Event oriented Social Summarization Dataset
Ruifang He | Liangliang Zhao | Huanyu Liu
Proceedings of the 28th International Conference on Computational Linguistics

With social media becoming popular, a vast of short and noisy messages are produced by millions of users when a hot event happens. Developing social summarization systems becomes more and more critical for people to quickly grasp core and essential information. However, the publicly available and high-quality large scale social summarization dataset is rare. Constructing such corpus is not easy and very expensive since short texts have very complex social characteristics. In this paper, we construct TWEETSUM, a new event-oriented dataset for social summarization. The original data is collected from twitter and contains 12 real world hot events with a total of 44,034 tweets and 11,240 users. Each event has four expert summaries, and we also have the annotation quality evaluation. In addition, we collect additional social signals (i.e. user relations, hashtags and user profiles) and further establish user relation network for each event. Besides the detailed dataset description, we show the performance of several typical extractive summarization methods on TWEETSUM to establish baselines. For further researches, we will release this dataset to the public.