Proceedings of the Second Workshop on NLP and Computational Social Science

Proceedings of the Second Workshop on NLP and Computational Social Science Dirk Hovy Svitlana Volkova David Bamman David Jurgens Brendan O'Connor Oren Tsur A. Seza Doğruöz August 2017

Vancouver, Canada

Association for Computational Linguistics http://www.aclweb.org/anthology/W17-29 book NLPandCSS:2017 Language-independent Gender Prediction on Twitter NikolaLjubešić DarjaFišer TomažErjavec Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 1–6 http://www.aclweb.org/anthology/W17-2901 In this paper we present a set of experiments and analyses on predicting the gender of Twitter users based on language-independent features extracted either from the text or the metadata of users' tweets. We perform our experiments on the TwiSty dataset containing manual gender annotations for users speaking six different languages. Our classification results show that, while the prediction model based on language-independent features performs worse than the bag-of-words model when training and testing on the same language, it regularly outperforms the bag-of-words model when applied to different languages, showing very stable results across various languages. Finally we perform a comparative analysis of feature effect sizes across the six languages and show that differences in our features correspond to cultural distances. inproceedings ljubevsic-fivser-erjavec:2017:NLPandCSS When does a compliment become sexist? Analysis and classification of ambivalent sexism using twitter data AkshitaJha RadhikaMamidi Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 7–16 http://www.aclweb.org/anthology/W17-2902 Sexism is prevalent in today’s society, both offline and online, and poses a credible threat to social equality with respect to gender. According to ambivalent sexism theory (Glick and Fiske, 1996), it comes in two forms: Hostile and Benevolent. While hostile sexism is characterized by an explicitly negative attitude, benevolent sexism is more subtle. Previous works on computationally detecting sexism present online are restricted to identifying the hostile form. Our objective is to investigate the less pronounced form of sexism demonstrated online. We achieve this by creating and analyzing a dataset of tweets that exhibit benevolent sexism. By using Support Vector Machines (SVM), sequence-to-sequence models and FastText classifier, we classify tweets into ‘Hostile’, ‘Benevolent’ or ‘Others’ class depending on the kind of sexism they exhibit. We have been able to achieve an F1-score of 87.22% using FastText classifier. Our work helps analyze and understand the much prevalent ambivalent sexism in social media. inproceedings jha-mamidi:2017:NLPandCSS Personality Driven Differences in Paraphrase Preference DanielPreoţiuc-Pietro JordanCarpenter LyleUngar Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 17–26 http://www.aclweb.org/anthology/W17-2903 W17-2903.Presentation.pdf Personality plays a decisive role in how people behave in different scenarios, including online social media. Researchers have used such data to study how personality can be predicted from language use. In this paper, we study phrase choice as a particular stylistic linguistic difference, as opposed to the mostly topical differences identified previously. Building on previous work on demographic preferences, we quantify differences in paraphrase choice from a massive Facebook data set with posts from over 115,000 users. We quantify the predictive power of phrase choice in user profiling and use phrase choice to study psycholinguistic hypotheses. This work is relevant to future applications that aim to personalize text generation to specific personality types. inproceedings preoiucpietro-carpenter-ungar:2017:NLPandCSS community2vec: Vector representations of online communities encode semantic relationships TrevorMartin Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 27–31 http://www.aclweb.org/anthology/W17-2904 Vector embeddings of words have been shown to encode meaningful semantic relationships that enable solving of complex analogies. This vector embedding concept has been extended successfully to many different domains and in this paper we both create and visualize vector representations of an unstructured collection of online communities based on user participation. Further, we quantitatively and qualitatively show that these representations allow solving of semantically meaningful community analogies and also other more general types of relationships. These results could help improve community recommendation engines and also serve as a tool for sociological studies of community relatedness. inproceedings martin:2017:NLPandCSS Telling Apart Tweets Associated with Controversial versus Non-Controversial Topics AseelAddawood RezvanehRezapour OmidAbdar JanaDiesner Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 32–41 http://www.aclweb.org/anthology/W17-2905 In this paper, we evaluate the predictability of tweets associated with controversial versus non-controversial topics. As a first step, we crowd-sourced the scoring of a predefined set of topics on a Likert scale from non-controversial to controversial. Our feature set entails and goes beyond sentiment features, e.g., by leveraging empathic language and other features that have been previously used but are new for this particular study. We find focusing on the structural characteristics of tweets to be beneficial for this task. Using a combination of emphatic, language-specific, and Twitter-specific features for supervised learning resulted in 87% accuracy (F1) for cross-validation of the training set and 63.4% accuracy when using the test set. Our analysis shows that features specific to Twitter or social media, in general, are more prevalent in tweets on controversial topics than in non-controversial ones. To test the premise of the paper, we conducted two additional sets of experiments, which led to mixed results. This finding will inform our future investigations into the relationship between language use on social media and the perceived controversiality of topics. inproceedings addawood-EtAl:2017:NLPandCSS Cross-Lingual Classification of Topics in Political Texts GoranGlavaš FedericoNanni Simone PaoloPonzetto Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 42–46 http://www.aclweb.org/anthology/W17-2906 In this paper, we propose an approach for cross-lingual topical coding of sentences from electoral manifestos of political parties in different languages. To this end, we exploit continuous semantic text representations and induce a joint multilingual semantic vector spaces to enable supervised learning using manually-coded sentences across different languages. Our experimental results show that classifiers trained on multilingual data yield performance boosts over monolingual topic classification. inproceedings glavavs-nanni-ponzetto:2017:NLPandCSS Mining Social Science Publications for Survey Variables AndreaZielinski PeterMutschke Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 47–52 http://www.aclweb.org/anthology/W17-2907 Research in Social Science is usually based on survey data where individual research questions relate to observable concepts (variables). However, due to a lack of standards for data citations a reliable identification of the variables used is often difficult. In this paper, we present a work-in-progress study that seeks to provide a solution to the variable detection task based on supervised machine learning algorithms, using a linguistic analysis pipeline to extract a rich feature set, including terminological concepts and similarity metric scores. Further, we present preliminary results on a small dataset that has been specifically designed for this task, yielding a significant increase in performance over the random baseline. inproceedings zielinski-mutschke:2017:NLPandCSS Linguistic Markers of Influence in Informal Interactions ShrimaiPrabhumoye SamridhiChoudhary EvangeliaSpiliopoulou ChristopherBogart CarolynRose Alan WBlack Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 53–62 http://www.aclweb.org/anthology/W17-2908 There has been a long standing interest in understanding `Social Influence' both in Social Sciences and in Computational Linguistics. In this paper, we present a novel approach to study and measure interpersonal influence in daily interactions. Motivated by the basic principles of influence, we attempt to identify indicative linguistic features of the posts in an online knitting community. We present the scheme used to operationalize and label the posts as influential or non-influential. Experiments with the identified features show an improvement in the classification accuracy of influence by 3.15%. Our results illustrate the important correlation between the structure of the language and its potential to influence others. inproceedings prabhumoye-EtAl:2017:NLPandCSS Non-lexical Features Encode Political Affiliation on Twitter RachaelTatman LeoStewart AmandalynnePaullada EmmaSpiro Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 63–67 http://www.aclweb.org/anthology/W17-2909 Previous work on classifying Twitter users' political alignment has mainly focused on lexical and social network features. This study provides evidence that political affiliation is also reflected in features which have been previously overlooked: users' discourse patterns (proportion of Tweets that are retweets or replies) and their rate of use of capitalization and punctuation. We find robust differences between politically left- and right-leaning communities with respect to these discourse and sub-lexical features, although they are not enough to train a high-accuracy classifier. inproceedings tatman-EtAl:2017:NLPandCSS Modelling Participation in Small Group Social Sequences with Markov Rewards Analysis GabrielMurray Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 68–72 http://www.aclweb.org/anthology/W17-2910 We explore a novel computational approach for analyzing member participation in small group social sequences. Using a complex state representation combining information about dialogue act types, sentiment expression, and participant roles, we explore which sequence states are associated with high levels of member participation. Using a Markov Rewards framework, we associate particular states with immediate positive and negative rewards, and employ a Value Iteration algorithm to calculate the expected value of all states. In our findings, we focus on discourse states belonging to team leaders and project managers which are either very likely or very unlikely to lead to participation from the rest of the group members. inproceedings murray:2017:NLPandCSS Code-Switching as a Social Act: The Case of Arabic Wikipedia Talk Pages MichaelYoder ShrutiRijhwani CarolynRosé LoriLevin Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 73–82 http://www.aclweb.org/anthology/W17-2911 Code-switching has been found to have social motivations in addition to syntactic constraints. In this work, we explore the social effect of code-switching in an online community. We present a task from the Arabic Wikipedia to capture language choice, in this case code-switching between Arabic and other languages, as a predictor of social influence in collaborative editing. We find that code-switching is positively associated with Wikipedia editor success, particularly borrowing technical language on pages with topics less directly related to Arabic-speaking regions. inproceedings yoder-EtAl:2017:NLPandCSS How Does Twitter User Behavior Vary Across Demographic Groups? ZachWood-Doughty MichaelSmith DavidBroniatowski MarkDredze Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 83–89 http://www.aclweb.org/anthology/W17-2912 Demographically-tagged social media messages are a common source of data for computational social science. While these messages can indicate differences in beliefs and behaviors between demographic groups, we do not have a clear understanding of how different demographic groups use platforms such as Twitter. This paper presents a preliminary analysis of how groups' differing behaviors may confound analyses of the groups themselves. We analyzed one million Twitter users by first inferring demographic attributes, and then measuring several indicators of Twitter behavior. We find differences in these indicators across demographic groups, suggesting that there may be underlying differences in how different demographic groups use Twitter. inproceedings wooddoughty-EtAl:2017:NLPandCSS Ideological Phrase Indicators for Classification of Political Discourse Framing on Twitter KristenJohnson I-TaLee DanGoldwasser Proceedings of the Second Workshop on NLP and Computational Social Science August 2017

Vancouver, Canada

Association for Computational Linguistics 90–99 http://www.aclweb.org/anthology/W17-2913 Politicians carefully word their statements in order to influence how others view an issue, a political strategy called framing. Simultaneously, these frames may also reveal the beliefs or positions on an issue of the politician. Simple language features such as unigrams, bigrams, and trigrams are important indicators for identifying the general frame of a text, for both longer congressional speeches and shorter tweets of politicians. However, tweets may contain multiple unigrams across different frames which limits the effectiveness of this approach. In this paper, we present a joint model which uses both linguistic features of tweets and ideological phrase indicators extracted from a state-of-the-art embedding-based model to predict the general frame of political tweets. inproceedings johnson-lee-goldwasser:2017:NLPandCSS