Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media

Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media Lun-Wei Ku Cheng-Te Li April 2017

Valencia, Spain

Association for Computational Linguistics http://www.aclweb.org/anthology/W17-11 book SocialNLP2017:2017 A Survey on Hate Speech Detection using Natural Language Processing AnnaSchmidt MichaelWiegand Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media April 2017

Valencia, Spain

Association for Computational Linguistics 1–10 http://www.aclweb.org/anthology/W17-1101 This paper presents a survey on hate speech detection. Given the steadily growing body of social media content, the amount of online hate speech is also increasing. Due to the massive scale of the web, methods that automatically detect hate speech are required. Our survey describes key areas that have been explored to automatically recognize these types of utterances using natural language processing. We also discuss limits of those approaches. inproceedings schmidt-wiegand:2017:SocialNLP2017 Facebook sentiment: Reactions and Emojis YeTian ThiagoGalery GiulioDulcinati EmiliaMolimpakis ChaoSun Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media April 2017

Valencia, Spain

Association for Computational Linguistics 11–16 http://www.aclweb.org/anthology/W17-1102 Emojis are used frequently in social media. A widely assumed view is that emojis express the emotional state of the user, which has led to research focusing on the expressiveness of emojis independent from the linguistic context. We argue that emojis and the linguistic texts can modify the meaning of each other. The overall communicated meaning is not a simple sum of the two channels. In order to study the meaning interplay, we need data indicating the overall sentiment of the entire message as well as the sentiment of the emojis stand-alone. We propose that Facebook Reactions are a good data source for such a purpose. FB reactions (e.g. “Love” and “Angry”) indicate the readers' overall sentiment, against which we can investigate the types of emojis used the comments under different reaction profiles. We present a data set of 21,000 FB posts (57 million reactions and 8 million comments) from public media pages across four countries. inproceedings tian-EtAl:2017:SocialNLP2017 Potential and Limitations of Cross-Domain Sentiment Classification Jan MilanDeriu MartinWeilenmann DirkVon Gruenigen MarkCieliebak Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media April 2017

Valencia, Spain

Association for Computational Linguistics 17–24 http://www.aclweb.org/anthology/W17-1103 In this paper we investigate the cross-domain performance of a current state-of-the-art sentiment analysis systems. For this purpose we train a convolutional neural network (CNN) on data from different domains and evaluate its performance on other domains. Furthermore, we evaluate the usefulness of combining a large amount of different smaller annotated corpora to a large corpus. Our results show that more sophisticated approaches are required to train a system that works equally well on various domains. inproceedings deriu-EtAl:2017:SocialNLP2017 Aligning Entity Names with Online Aliases on Twitter KevinMcKelvey PeterGoutzounis Stephenda Cruz NathanaelChambers Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media April 2017

Valencia, Spain

Association for Computational Linguistics 25–35 http://www.aclweb.org/anthology/W17-1104 This paper presents new models that automatically align online aliases with their real entity names. Many research applications rely on identifying entity names in text, but people often refer to entities with unexpected nicknames and aliases. For example, The King and King James are aliases for Lebron James, a professional basketball player. Recent work on entity linking attempts to resolve mentions to knowledge base entries, like a wikipedia page, but linking is unfortunately limited to well-known entities with pre-built pages. This paper asks a more basic question: can aliases be aligned without background knowledge of the entity? Further, can the semantics surrounding alias mentions be used to inform alignments? We describe statistical models that make decisions based on the lexicographic properties of the aliases with their semantic context in a large corpus of tweets. We experiment on a database of Twitter users and their usernames, and present the first human evaluation for this task. Alignment accuracy approaches human performance at 81%, and we show that while lexicographic features are most important, the semantic context of an alias further improves classification accuracy. inproceedings mckelvey-EtAl:2017:SocialNLP2017 Character-based Neural Embeddings for Tweet Clustering SvitlanaVakulenko LyndonNixon MihaiLupu Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media April 2017

Valencia, Spain

Association for Computational Linguistics 36–44 http://www.aclweb.org/anthology/W17-1105 In this paper we show how the performance of tweet clustering can be improved by leveraging character-based neural networks. The proposed approach overcomes the limitations related to the vocabulary explosion in the word-based models and allows for the seamless processing of the multilingual content. Our evaluation results and code are available on-line: https://github.com/vendi12/tweet2vec_clustering. inproceedings vakulenko-nixon-lupu:2017:SocialNLP2017 A Twitter Corpus and Benchmark Resources for German Sentiment Analysis MarkCieliebak Jan MilanDeriu DominicEgger FatihUzdilli Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media April 2017

Valencia, Spain

Association for Computational Linguistics 45–51 http://www.aclweb.org/anthology/W17-1106 In this paper we present SB10k, a new corpus for sentiment analysis with approx. 10,000 German tweets. We use this new corpus and two existing corpora to provide state-of-the-art benchmarks for sentiment analysis in German: we implemented a CNN (based on the winning system of SemEval-2016) and a feature-based SVM and compare their performance on all three corpora. For the CNN, we also created German word embeddings trained on 300M tweets. These word embeddings were then optimized for sentiment analysis using distant-supervised learning. The new corpus, the German word embeddings (plain and optimized), and source code to re-run the benchmarks are publicly available. inproceedings cieliebak-EtAl:2017:SocialNLP2017