<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W17">
  <paper id="4900">
    <title>Proceedings of the Workshop on Stylistic Variation</title>
    <editor>Julian Brooke</editor>
    <editor>Thamar Solorio</editor>
    <editor>Moshe Koppel</editor>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/W17-49</url>
    <bibtype>book</bibtype>
    <bibkey>StyVa:2017</bibkey>
  </paper>

  <paper id="4901">
    <title>From Shakespeare to Twitter: What are Language Styles all about?</title>
    <author><first>Wei</first><last>Xu</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;9</pages>
    <url>http://www.aclweb.org/anthology/W17-4901</url>
    <abstract>As natural language processing research is growing and largely driven by the
	availability of data, we expanded research from news and small-scale dialog
	corpora to web and social media. User-generated data and crowdsourcing opened
	the door for investigating human language of various styles with more
	statistical power and real-world applications. In this position/survey paper, I
	will review and discuss seven language styles that I believe to be important
	and interesting to study: influential work in the past, challenges at the
	present, and potential impact for the future.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>xu:2017:StyVa</bibkey>
  </paper>

  <paper id="4902">
    <title>Shakespearizing Modern Language Using Copy-Enriched Sequence to Sequence Models</title>
    <author><first>Harsh</first><last>Jhamtani</last></author>
    <author><first>Varun</first><last>Gangal</last></author>
    <author><first>Eduard</first><last>Hovy</last></author>
    <author><first>Eric</first><last>Nyberg</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>10&#8211;19</pages>
    <url>http://www.aclweb.org/anthology/W17-4902</url>
    <attachment type="attachment">W17-4902.Attachment.zip</attachment>
    <abstract>Variations in writing styles are commonly used to adapt the content to a
	specific context, audience, or purpose. However, applying stylistic variations
	is still by and large a manual process, and there have been little efforts
	towards automating it. In this paper we explore automated methods to transform
	text from modern English to Shakespearean English using an end to end trainable
	neural model with pointers to enable copy action. To tackle limited amount of
	parallel data, we pre-train embeddings of words by leveraging external
	dictionaries mapping Shakespearean words to modern English words as well as
	additional text. Our methods are able to get a BLEU score of 31+, an
	improvement of ≈ 6 points above the strongest baseline. We publicly release
	our code to foster further research in this area.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jhamtani-EtAl:2017:StyVa</bibkey>
  </paper>

  <paper id="4903">
    <title>Discovering Stylistic Variations in Distributional Vector Space Models via Lexical Paraphrases</title>
    <author><first>Xing</first><last>Niu</last></author>
    <author><first>Marine</first><last>Carpuat</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>20&#8211;27</pages>
    <url>http://www.aclweb.org/anthology/W17-4903</url>
    <abstract>Detecting and analyzing stylistic variation in language is relevant to diverse
	Natural Language Processing applications. In this work, we investigate whether
	salient dimensions of style variations are embedded in standard distributional
	vector spaces of word meaning. We hypothesizes that distances between
	embeddings of lexical paraphrases can help isolate style from meaning
	variations and help identify latent style dimensions. We conduct a qualitative
	analysis of latent style dimensions, and show the effectiveness of identified
	style subspaces on a lexical formality prediction task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>niu-carpuat:2017:StyVa</bibkey>
  </paper>

  <paper id="4904">
    <title>Harvesting Creative Templates for Generating Stylistically Varied Restaurant Reviews</title>
    <author><first>Shereen</first><last>Oraby</last></author>
    <author><first>Sheideh</first><last>Homayon</last></author>
    <author><first>Marilyn</first><last>Walker</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>28&#8211;36</pages>
    <url>http://www.aclweb.org/anthology/W17-4904</url>
    <abstract>Many of the creative and figurative elements that make language
	exciting are lost in translation in current natural language
	generation engines. In this paper, we explore a method to harvest
	templates from positive and negative reviews in the restaurant domain,
	with the goal of vastly expanding the types of stylistic variation
	available to the natural language generator. We learn hyperbolic
	adjective patterns that are representative of the strongly-valenced
	expressive language commonly used in either positive or negative
	reviews.  We then identify and delexicalize entities, and use
	heuristics to extract generation templates from review sentences. We
	evaluate the learned templates against more traditional review
	templates, using subjective measures of convincingness, 
	interestingness, and naturalness. Our results show that the
	learned templates score highly on these measures.  Finally, we analyze
	the linguistic categories that characterize the learned positive and
	negative templates. We plan to use the learned templates to improve the
	conversational style of dialogue systems in the
	restaurant domain.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>oraby-homayon-walker:2017:StyVa</bibkey>
  </paper>

  <paper id="4905">
    <title>Is writing style predictive of scientific fraud?</title>
    <author><first>Chlo&#233;</first><last>Braud</last></author>
    <author><first>Anders</first><last>S&#248;gaard</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>37&#8211;42</pages>
    <url>http://www.aclweb.org/anthology/W17-4905</url>
    <abstract>The problem of detecting scientific fraud using machine learning was recently
	introduced, with initial, positive results from a model taking into account
	various general indicators.
	The results seem to suggest that writing style is predictive of scientific
	fraud. 
	We revisit these initial experiments, and show that the leave-one-out testing
	procedure they used likely leads to a slight over-estimate of the
	predictability, 
	but also that simple models can outperform their proposed model by some margin.
	We go on to explore more abstract linguistic features, such as linguistic
	complexity and discourse structure, only to obtain negative results. 
	Upon analyzing our models, we do see some interesting patterns, though:
	Scientific fraud, for examples, contains less comparison, as well as different
	types of hedging and ways of presenting logical reasoning.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>braud-sogaard:2017:StyVa</bibkey>
  </paper>

  <paper id="4906">
    <title>"Deep" Learning : Detecting Metaphoricity in Adjective-Noun Pairs</title>
    <author><first>Yuri</first><last>Bizzoni</last></author>
    <author><first>Stergios</first><last>Chatzikyriakidis</last></author>
    <author><first>Mehdi</first><last>Ghanimifard</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>43&#8211;52</pages>
    <url>http://www.aclweb.org/anthology/W17-4906</url>
    <abstract>Metaphor is one of the most studied and widespread figures of speech and an
	essen- tial element of individual style. In this pa- per we look at metaphor
	identification in Adjective-Noun pairs. We show that us- ing a single neural
	network combined with pre-trained vector embeddings can outper- form the state
	of the art in terms of accu- racy. In specific, the approach presented in this
	paper is based on two ideas: a) trans- fer learning via using pre-trained
	vectors representing adjective noun pairs, and b) a neural network as a model
	of composition that predicts a metaphoricity score as out- put. We present
	several different architec- tures for our system and evaluate their per-
	formances. Variations on dataset size and on the kinds of embeddings are also
	inves- tigated. We show considerable improve- ment over the previous approaches
	both in terms of accuracy and w.r.t the size of an- notated training data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bizzoni-chatzikyriakidis-ghanimifard:2017:StyVa</bibkey>
  </paper>

  <paper id="4907">
    <title>Authorship Attribution with Convolutional Neural Networks and POS-Eliding</title>
    <author><first>Julian</first><last>Hitschler</last></author>
    <author><first>Esther</first><last>van den Berg</last></author>
    <author><first>Ines</first><last>Rehbein</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>53&#8211;58</pages>
    <url>http://www.aclweb.org/anthology/W17-4907</url>
    <abstract>We use a convolutional neural network to perform authorship identification on a
	very homogeneous dataset of scientific publications. In order to investigate
	the effect of domain biases, we obscure words below a certain frequency
	threshold, retaining only their POS-tags. This procedure improves test
	performance due to better generalization on unseen data. Using our method, we
	are able to predict the authors of scientific publications in the same
	discipline at levels well above chance.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hitschler-vandenberg-rehbein:2017:StyVa</bibkey>
  </paper>

  <paper id="4908">
    <title>Topic and audience effects on distinctively Scottish vocabulary usage in Twitter data</title>
    <author><first>Philippa</first><last>Shoemark</last></author>
    <author><first>James</first><last>Kirby</last></author>
    <author><first>Sharon</first><last>Goldwater</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>59&#8211;68</pages>
    <url>http://www.aclweb.org/anthology/W17-4908</url>
    <abstract>Sociolinguistic research suggests that speakers modulate their language style
	in response to their audience. Similar effects have recently been claimed to
	occur in the informal written context of Twitter, with users choosing less
	region-specific and non-standard vocabulary when addressing larger audiences.
	However, these studies have not carefully controlled for the possible confound
	of topic: that is, tweets addressed to a broad audience might also tend towards
	topics that engender a more formal style. In addition, it is not clear to what
	extent previous results generalize to different samples of users. Using
	mixed-effects models, we show that audience and topic have independent effects
	on the rate of distinctively Scottish usage in two demographically distinct
	Twitter user samples. However, not all effects are consistent between the two
	groups, underscoring the importance of replicating studies on distinct user
	samples before drawing strong conclusions from social media data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shoemark-kirby-goldwater:2017:StyVa</bibkey>
  </paper>

  <paper id="4909">
    <title>Differences in type-token ratio and part-of-speech frequencies in male and female Russian written texts</title>
    <author><first>Tatiana</first><last>Litvinova</last></author>
    <author><first>Pavel</first><last>Seredin</last></author>
    <author><first>Olga</first><last>Litvinova</last></author>
    <author><first>Olga</first><last>Zagorovskaya</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>69&#8211;73</pages>
    <url>http://www.aclweb.org/anthology/W17-4909</url>
    <abstract>The differences in the frequencies of some parts of speech (POS), particularly
	function words, and lexical diversity in male and female speech have been
	pointed out in a number of papers. The classifiers using exclusively
	context-independent parameters have proved to be highly effective. However,
	there are still issues that have to be addressed as a lot of studies are
	performed for English and the genre and topic of texts is sometimes neglected.
	The aim of this paper is to investigate the association between
	context-independent parameters of Russian written texts and the gender of their
	authors and to design predictive re-gression models. A number of correlations
	were found. The obtained data is in good agreement with the results obtained
	for other languages. The model based on 5 parameters with the highest
	correlation coefficients was designed.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>litvinova-EtAl:2017:StyVa</bibkey>
  </paper>

  <paper id="4910">
    <title>Modeling Communicative Purpose with Functional Style: Corpus and Features for German Genre and Register Analysis</title>
    <author><first>Thomas</first><last>Haider</last></author>
    <author><first>Alexis</first><last>Palmer</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>74&#8211;84</pages>
    <url>http://www.aclweb.org/anthology/W17-4910</url>
    <abstract>While there is wide acknowledgement in NLP of the utility of document
	characterization by genre, it is quite difficult to determine a definitive set
	of features or even a comprehensive list of
	genres. This paper addresses both issues. First, with prototype semantics, we
	develop a hierarchical taxonomy of discourse functions. We implement the
	taxonomy by developing a new text genre corpus of contemporary German to
	perform a text based comparative register analysis.
	Second, we extract a host of style features, both deep and shallow, aiming
	beyond linguistically motivated features at situational correlates in texts. 
	The feature sets are used for supervised text genre classification, on which
	our models achieve high accuracy. 
	The combination of the corpus typology and feature sets allows us to
	characterize types of communicative purpose in a comparative setup, by
	qualitative interpretation of style feature loadings of a regularized
	discriminant analysis.
	Finally, to determine the dependence of genre on topics (which are arguably the
	distinguishing factor of sub-genre), we compare and combine our style models
	with Latent Dirichlet Allocation features across different corpus settings with
	unstable topics.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>haider-palmer:2017:StyVa</bibkey>
  </paper>

  <paper id="4911">
    <title>Stylistic Variation in Television Dialogue for Natural Language Generation</title>
    <author><first>Grace</first><last>Lin</last></author>
    <author><first>Marilyn</first><last>Walker</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>85&#8211;93</pages>
    <url>http://www.aclweb.org/anthology/W17-4911</url>
    <abstract>Conversation is a critical component of storytelling, where key information is
	often revealed by
	what/how a character says it. We focus on the issue of character voice and
	build stylistic models with linguistic features related to natural language
	generation decisions. Using a dialogue corpus of the television series, The Big
	Bang Theory, we apply content analysis to extract relevant linguistic features
	to build character-based stylistic models, and we test the model-fit through an
	user perceptual experiment with Amazon's Mechanical Turk. The results are
	encouraging in that human subjects tend to perceive the generated utterances as
	being more similar to the character they are modeled on, than to another random
	character.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lin-walker:2017:StyVa</bibkey>
  </paper>

  <paper id="4912">
    <title>Controlling Linguistic Style Aspects in Neural Language Generation</title>
    <author><first>Jessica</first><last>Ficler</last></author>
    <author><first>Yoav</first><last>Goldberg</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>94&#8211;104</pages>
    <url>http://www.aclweb.org/anthology/W17-4912</url>
    <abstract>Most work on neural natural language generation (NNLG) focus on controlling the
	content of the generated text. We experiment with controling several stylistic
	aspects of the generated text, in addition to its content. The method is based
	on conditioned RNN language model, where the desired content as well as the
	stylistic parameters serve as conditioning contexts.
	We demonstrate the approach on the movie reviews domain and show that it is
	successful in generating coherent sentences corresponding to the required
	linguistic style and content.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ficler-goldberg:2017:StyVa</bibkey>
  </paper>

  <paper id="4913">
    <title>Approximating Style by N-gram-based Annotation</title>
    <author><first>Melanie</first><last>Andresen</last></author>
    <author><first>Heike</first><last>Zinsmeister</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>105&#8211;115</pages>
    <url>http://www.aclweb.org/anthology/W17-4913</url>
    <abstract>The concept of style is much debated in theoretical as well as empirical terms.
	From an empirical perspective, the key question is how to operationalize style
	and thus make it accessible for annotation and quantification. In authorship
	attribution, many different approaches have successfully resolved this issue at
	the cost of linguistic interpretability: The resulting algorithms may be able
	to distinguish one language variety from the other, but do not give us much
	information on their distinctive linguistic properties. We approach the issue
	of interpreting stylistic features by extracting linear and syntactic n-grams
	that are distinctive for a language variety. We present a study that
	exemplifies this process by a comparison of the German academic languages of
	linguistics and literary studies. Overall, our findings show that distinctive
	n-grams can be related to linguistic categories. The results suggest that the
	style of German literary studies is characterized by nominal structures and the
	style of linguistics by verbal ones.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>andresen-zinsmeister:2017:StyVa</bibkey>
  </paper>

  <paper id="4914">
    <title>Assessing the Stylistic Properties of Neurally Generated Text in Authorship Attribution</title>
    <author><first>Enrique</first><last>Manjavacas</last></author>
    <author><first>Jeroen</first><last>De Gussem</last></author>
    <author><first>Walter</first><last>Daelemans</last></author>
    <author><first>Mike</first><last>Kestemont</last></author>
    <booktitle>Proceedings of the Workshop on Stylistic Variation</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>116&#8211;125</pages>
    <url>http://www.aclweb.org/anthology/W17-4914</url>
    <abstract>Recent applications of neural language models have led to an increased interest
	in the automatic generation of natural language. However impressive, the
	evaluation of neurally generated text has so far remained rather informal and
	anecdotal. Here, we present an attempt at the systematic assessment of one
	aspect of the quality of neurally generated text. We focus on a specific aspect
	of neural language generation: its ability to reproduce authorial writing
	styles. Using established models for authorship attribution, we empirically
	assess the stylistic qualities of neurally generated text. In comparison to
	conventional language models, neural models generate fuzzier text, that is
	relatively harder to attribute correctly. Nevertheless, our results also
	suggest that neurally generated text offers more valuable perspectives for the
	augmentation of training data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>manjavacas-EtAl:2017:StyVa</bibkey>
  </paper>

</volume>

