<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W17">
  <paper id="2900">
    <title>Proceedings of the Second Workshop on NLP and Computational Social Science</title>
    <editor>Dirk Hovy</editor>
    <editor>Svitlana Volkova</editor>
    <editor>David Bamman</editor>
    <editor>David Jurgens</editor>
    <editor>Brendan O'Connor</editor>
    <editor>Oren Tsur</editor>
    <editor>A. Seza Doğruöz</editor>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/W17-29</url>
    <bibtype>book</bibtype>
    <bibkey>NLPandCSS:2017</bibkey>
  </paper>

  <paper id="2901">
    <title>Language-independent Gender Prediction on Twitter</title>
    <author><first>Nikola</first><last>Ljube&#x161;i&#x107;</last></author>
    <author><first>Darja</first><last>Fi&#x161;er</last></author>
    <author><first>Toma&#x17E;</first><last>Erjavec</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;6</pages>
    <url>http://www.aclweb.org/anthology/W17-2901</url>
    <abstract>In this paper we present a set of experiments and analyses on predicting the
	gender of Twitter users based on language-independent features extracted either
	from the text or the metadata of users' tweets. We perform our experiments on
	the TwiSty dataset containing manual gender annotations for users speaking six
	different languages. Our classification results show that, while the prediction
	model based on language-independent features performs worse than the
	bag-of-words model when training and testing on the same language, it regularly
	outperforms the bag-of-words model when applied to different languages, showing
	very stable results across various languages. Finally we perform a comparative
	analysis of feature effect sizes across the six languages and show that
	differences in our features correspond to cultural distances.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ljubevsic-fivser-erjavec:2017:NLPandCSS</bibkey>
  </paper>

  <paper id="2902">
    <title>When does a compliment become sexist? Analysis and classification of ambivalent sexism using twitter data</title>
    <author><first>Akshita</first><last>Jha</last></author>
    <author><first>Radhika</first><last>Mamidi</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>7&#8211;16</pages>
    <url>http://www.aclweb.org/anthology/W17-2902</url>
    <abstract>Sexism is prevalent in today’s society, both offline and online, and poses a
	credible threat to social equality with respect to gender. According to
	ambivalent sexism theory (Glick and Fiske, 1996), it comes in two forms:
	Hostile and Benevolent. While hostile sexism is characterized by an explicitly
	negative attitude, benevolent sexism is more subtle. Previous works on
	computationally detecting sexism present online are restricted to identifying
	the hostile form. Our objective is to investigate the less pronounced form of
	sexism demonstrated online. We achieve this by creating and analyzing a dataset
	of tweets that exhibit benevolent sexism. By using Support Vector Machines
	(SVM), sequence-to-sequence models and FastText classifier, we classify tweets
	into ‘Hostile’, ‘Benevolent’ or ‘Others’ class depending on the
	kind of sexism they exhibit. We have been able to achieve an F1-score of 87.22%
	using FastText classifier. Our work helps analyze and understand the much
	prevalent ambivalent sexism in social media.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jha-mamidi:2017:NLPandCSS</bibkey>
  </paper>

  <paper id="2903">
    <title>Personality Driven Differences in Paraphrase Preference</title>
    <author><first>Daniel</first><last>Preoţiuc-Pietro</last></author>
    <author><first>Jordan</first><last>Carpenter</last></author>
    <author><first>Lyle</first><last>Ungar</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>17&#8211;26</pages>
    <url>http://www.aclweb.org/anthology/W17-2903</url>
    <attachment type="presentation">W17-2903.Presentation.pdf</attachment>
     <abstract>Personality plays a decisive role in how people behave in different scenarios,
	including online social media. Researchers have used such data to study how
	personality can be predicted from language use. In this paper, we study phrase
	choice as a particular stylistic linguistic difference, as opposed to the
	mostly topical differences identified previously. Building on previous work on
	demographic preferences, we quantify differences in paraphrase choice from a
	massive Facebook data set with posts from over 115,000 users. We quantify the
	predictive power of phrase choice in user profiling and use phrase choice to
	study psycholinguistic hypotheses. This work is relevant to future applications
	that aim to personalize text generation to specific personality types.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>preoiucpietro-carpenter-ungar:2017:NLPandCSS</bibkey>
  </paper>

  <paper id="2904">
    <title>community2vec: Vector representations of online communities encode semantic relationships</title>
    <author><first>Trevor</first><last>Martin</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>27&#8211;31</pages>
    <url>http://www.aclweb.org/anthology/W17-2904</url>
    <abstract>Vector embeddings of words have been shown to encode meaningful semantic
	relationships that enable solving of complex analogies. This vector embedding
	concept has been extended successfully to many different domains and in this
	paper we both create and visualize vector representations of an unstructured
	collection of online communities based on user participation. Further, we
	quantitatively and qualitatively show that these representations allow solving
	of semantically meaningful community analogies and also other more general
	types of relationships. These results could help improve community
	recommendation engines and also serve as a tool for sociological studies of
	community relatedness.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>martin:2017:NLPandCSS</bibkey>
  </paper>

  <paper id="2905">
    <title>Telling Apart Tweets Associated with Controversial versus Non-Controversial Topics</title>
    <author><first>Aseel</first><last>Addawood</last></author>
    <author><first>Rezvaneh</first><last>Rezapour</last></author>
    <author><first>Omid</first><last>Abdar</last></author>
    <author><first>Jana</first><last>Diesner</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>32&#8211;41</pages>
    <url>http://www.aclweb.org/anthology/W17-2905</url>
    <abstract>In this paper, we evaluate the predictability of tweets associated with
	controversial versus non-controversial topics. As a first step, we
	crowd-sourced the scoring of a predefined set of topics on a Likert scale from
	non-controversial to controversial. Our feature set entails and goes beyond
	sentiment features, e.g., by leveraging empathic language and other features
	that have been previously used but are new for this particular study. We find
	focusing on the structural characteristics of tweets to be beneficial for this
	task. Using a combination of emphatic, language-specific, and Twitter-specific
	features for supervised learning resulted in 87% accuracy (F1) for
	cross-validation of the training set and 63.4% accuracy when using the test
	set. Our analysis shows that features specific to Twitter or social media, in
	general, are more prevalent in tweets on controversial topics than in
	non-controversial ones. To test the premise of the paper, we conducted two
	additional sets of experiments, which led to mixed results. This finding will
	inform our future investigations into the relationship between language use on
	social media and the perceived controversiality of topics.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>addawood-EtAl:2017:NLPandCSS</bibkey>
  </paper>

  <paper id="2906">
    <title>Cross-Lingual Classification of Topics in Political Texts</title>
    <author><first>Goran</first><last>Glava&#x161;</last></author>
    <author><first>Federico</first><last>Nanni</last></author>
    <author><first>Simone Paolo</first><last>Ponzetto</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>42&#8211;46</pages>
    <url>http://www.aclweb.org/anthology/W17-2906</url>
    <abstract>In this paper, we propose an approach for cross-lingual topical coding of
	sentences from electoral manifestos of political parties in different
	languages. To this end, we exploit continuous semantic text representations and
	induce a joint multilingual semantic vector spaces to enable supervised
	learning using manually-coded sentences across different languages. Our
	experimental results show that classifiers trained on multilingual data yield
	performance boosts over monolingual topic classification.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>glavavs-nanni-ponzetto:2017:NLPandCSS</bibkey>
  </paper>

  <paper id="2907">
    <title>Mining Social Science Publications for Survey Variables</title>
    <author><first>Andrea</first><last>Zielinski</last></author>
    <author><first>Peter</first><last>Mutschke</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>47&#8211;52</pages>
    <url>http://www.aclweb.org/anthology/W17-2907</url>
    <abstract>Research in Social Science is usually based on survey data where individual
	research questions relate to observable concepts (variables). However, due to a
	lack of standards for data citations a reliable identification of the variables
	used is often difficult. In this paper, we present a work-in-progress study
	that seeks to provide a solution to the variable detection task based on
	supervised machine learning algorithms, using a linguistic analysis pipeline to
	extract a rich feature set, including terminological concepts and similarity
	metric scores.
	Further, we present preliminary results on a small dataset that has been
	specifically designed for this task, yielding
	a significant increase in performance over the random baseline.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zielinski-mutschke:2017:NLPandCSS</bibkey>
  </paper>

  <paper id="2908">
    <title>Linguistic Markers of Influence in Informal Interactions</title>
    <author><first>Shrimai</first><last>Prabhumoye</last></author>
    <author><first>Samridhi</first><last>Choudhary</last></author>
    <author><first>Evangelia</first><last>Spiliopoulou</last></author>
    <author><first>Christopher</first><last>Bogart</last></author>
    <author><first>Carolyn</first><last>Rose</last></author>
    <author><first>Alan W</first><last>Black</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>53&#8211;62</pages>
    <url>http://www.aclweb.org/anthology/W17-2908</url>
    <abstract>There has been a long standing interest in understanding `Social Influence'
	both in Social Sciences and in Computational Linguistics. In this paper, we
	present a novel approach to study and measure interpersonal influence in daily
	interactions. Motivated by the basic principles of influence, we attempt to
	identify indicative linguistic features of the posts in an online knitting
	community. We present the scheme used to operationalize and label the posts as
	influential or non-influential. Experiments with the identified features show
	an improvement in the classification accuracy of influence by 3.15%. Our
	results illustrate the important correlation between the structure of the
	language and its potential to influence others.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>prabhumoye-EtAl:2017:NLPandCSS</bibkey>
  </paper>

  <paper id="2909">
    <title>Non-lexical Features Encode Political Affiliation on Twitter</title>
    <author><first>Rachael</first><last>Tatman</last></author>
    <author><first>Leo</first><last>Stewart</last></author>
    <author><first>Amandalynne</first><last>Paullada</last></author>
    <author><first>Emma</first><last>Spiro</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>63&#8211;67</pages>
    <url>http://www.aclweb.org/anthology/W17-2909</url>
    <abstract>Previous work on classifying Twitter users' political alignment has mainly
	focused on lexical and social network features. This study provides evidence
	that political affiliation is also reflected in features which have been
	previously overlooked: users' discourse patterns (proportion of Tweets that are
	retweets or replies) and their rate of use of capitalization and punctuation.
	We find robust differences between politically left- and right-leaning
	communities with respect to these discourse and sub-lexical features, although
	they are not enough to train a high-accuracy classifier.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tatman-EtAl:2017:NLPandCSS</bibkey>
  </paper>

  <paper id="2910">
    <title>Modelling Participation in Small Group Social Sequences with Markov Rewards Analysis</title>
    <author><first>Gabriel</first><last>Murray</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>68&#8211;72</pages>
    <url>http://www.aclweb.org/anthology/W17-2910</url>
    <abstract>We explore a novel computational approach for analyzing member participation in
	small group social sequences. Using a complex state representation combining
	information about dialogue act types, sentiment expression, and participant
	roles, we explore which sequence states are associated with high levels of
	member participation. Using a Markov Rewards framework, we associate particular
	states with immediate positive and negative rewards, and employ a Value
	Iteration algorithm to calculate the expected value of all states. In our
	findings, we focus on discourse states belonging to team leaders and project
	managers which are either very likely or very unlikely to lead to participation
	from the rest of the group members.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>murray:2017:NLPandCSS</bibkey>
  </paper>

  <paper id="2911">
    <title>Code-Switching as a Social Act: The Case of Arabic Wikipedia Talk Pages</title>
    <author><first>Michael</first><last>Yoder</last></author>
    <author><first>Shruti</first><last>Rijhwani</last></author>
    <author><first>Carolyn</first><last>Ros&#233;</last></author>
    <author><first>Lori</first><last>Levin</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>73&#8211;82</pages>
    <url>http://www.aclweb.org/anthology/W17-2911</url>
    <abstract>Code-switching has been found to have social motivations in addition to
	syntactic constraints.
	In this work, we explore the social effect of code-switching in an online
	community.
	We present a task from the Arabic Wikipedia to capture language choice, in this
	case code-switching between Arabic and other languages, as a predictor of
	social influence in collaborative editing.
	We find that code-switching is positively associated with Wikipedia editor
	success, particularly borrowing technical language on pages with topics less
	directly related to Arabic-speaking regions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yoder-EtAl:2017:NLPandCSS</bibkey>
  </paper>

  <paper id="2912">
    <title>How Does Twitter User Behavior Vary Across Demographic Groups?</title>
    <author><first>Zach</first><last>Wood-Doughty</last></author>
    <author><first>Michael</first><last>Smith</last></author>
    <author><first>David</first><last>Broniatowski</last></author>
    <author><first>Mark</first><last>Dredze</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>83&#8211;89</pages>
    <url>http://www.aclweb.org/anthology/W17-2912</url>
    <abstract>Demographically-tagged social media messages are a common source of data for
	computational social science.  While these messages can indicate differences in
	beliefs and behaviors between demographic groups, we do not have a clear
	understanding of how different demographic groups use platforms such as
	Twitter.  This paper presents a preliminary analysis of how groups' differing
	behaviors may confound analyses of the groups themselves.  We analyzed one
	million Twitter users by first inferring demographic attributes, and then
	measuring several indicators of Twitter behavior. We find differences in these
	indicators across demographic groups, suggesting that there may be underlying
	differences in how different demographic groups use Twitter.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wooddoughty-EtAl:2017:NLPandCSS</bibkey>
  </paper>

  <paper id="2913">
    <title>Ideological Phrase Indicators for Classification of Political Discourse Framing on Twitter</title>
    <author><first>Kristen</first><last>Johnson</last></author>
    <author><first>I-Ta</first><last>Lee</last></author>
    <author><first>Dan</first><last>Goldwasser</last></author>
    <booktitle>Proceedings of the Second Workshop on NLP and Computational Social Science</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>90&#8211;99</pages>
    <url>http://www.aclweb.org/anthology/W17-2913</url>
    <abstract>Politicians carefully word their statements in order to influence how others
	view an issue, a political strategy called framing. Simultaneously, these
	frames may also reveal the beliefs or positions on an issue of the politician.
	Simple language features such as unigrams, bigrams, and trigrams are important
	indicators for identifying the general frame of a text, for both longer
	congressional speeches and shorter tweets of politicians. However, tweets may
	contain multiple unigrams across different frames which limits the
	effectiveness of this approach. In this paper, we present a joint model which
	uses both linguistic features of tweets and ideological phrase indicators
	extracted from a state-of-the-art embedding-based model to predict the general
	frame of political tweets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>johnson-lee-goldwasser:2017:NLPandCSS</bibkey>
  </paper>

</volume>

