<?xml version="1.0" encoding="UTF-8" ?>
<volume id="E17">
  <paper id="4000">
    <title>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</title>
    <editor>Florian Kunneman</editor>
    <editor>Uxoa I&#241;urrieta</editor>
    <editor>John J. Camilleri</editor>
    <editor>Mariona Coll Ardanuy</editor>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/E17-4</url>
    <bibtype>book</bibtype>
    <bibkey>EACLSRW17:2017</bibkey>
  </paper>

  <paper id="4001">
    <title>Pragmatic descriptions of perceptual stimuli</title>
    <author><first>Emiel</first><last>van Miltenburg</last></author>
    <booktitle>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;10</pages>
    <url>http://www.aclweb.org/anthology/E17-4001</url>
    <abstract>This research proposal discusses pragmatic factors in image description,
	arguing that current automatic image description systems do not take these
	factors into account. I present a general model of the human image description
	process, and propose to study this process using corpus analysis, experiments,
	and computational modeling. This will lead to a better characterization of
	human image description behavior, providing a road map for future research in
	automatic image description, and the automatic description of perceptual
	stimuli in general.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>vanmiltenburg:2017:EACLSRW17</bibkey>
  </paper>

  <paper id="4002">
    <title>Detecting spelling variants in non-standard texts</title>
    <author><first>Fabian</first><last>Barteld</last></author>
    <booktitle>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>11&#8211;22</pages>
    <url>http://www.aclweb.org/anthology/E17-4002</url>
    <abstract>Spelling variation in non-standard language, e.g. computer-mediated
	communication and historical texts, is usually treated as a deviation from a
	standard spelling, e.g. 2mr as an non-standard spelling for tomorrow.
	Consequently, in normalization &#8211; the standard approach of dealing with
	spelling variation &#8211; so-called non-standard words are mapped to their
	corresponding standard words. However, there is not always a corresponding
	standard word. This can be the case for single types (like emoticons in
	computer-mediated communication) or a complete language, e.g. texts from
	historical languages that did not develop to a standard variety. The approach
	presented in this
	thesis proposal deals with spelling variation in absence of reference to a
	standard. The task is to detect pairs of types that are variants of the same
	morphological word. An approach for spelling-variant detection is presented,
	where pairs of potential spelling variants are generated with Levenshtein
	distance and subsequently filtered by supervised machine learning. The
	approach is evaluated on historical Low German texts. Finally, further
	perspectives are discussed.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>barteld:2017:EACLSRW17</bibkey>
  </paper>

  <paper id="4003">
    <title>Replication issues in syntax-based aspect extraction for opinion mining</title>
    <author><first>Edison</first><last>Marrese-Taylor</last></author>
    <author><first>Yutaka</first><last>Matsuo</last></author>
    <booktitle>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>23&#8211;32</pages>
    <url>http://www.aclweb.org/anthology/E17-4003</url>
    <abstract>Reproducing experiments is an important instrument to validate previous work
	and build upon existing approaches. It has been tackled numerous times in
	different areas of science. In this paper, we introduce an empirical
	replicability study of three well-known algorithms for syntactic centric
	aspect-based opinion mining. We show that reproducing results continues to be a
	difficult endeavor, mainly due to the lack of details regarding preprocessing
	and parameter setting, as well as due to the absence of available
	implementations that clarify these details. We consider these are important
	threats to validity of the research on the field, specifically when compared to
	other problems in NLP where public datasets and code availability are critical
	validity components. We conclude by encouraging code-based research, which we
	think has a key role in helping researchers to understand the meaning of the
	state-of-the-art better and to generate continuous advances.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>marresetaylor-matsuo:2017:EACLSRW17</bibkey>
  </paper>

  <paper id="4004">
    <title>Discourse Relations and Conjoined VPs: Automated Sense Recognition</title>
    <author><first>Valentina</first><last>Pyatkin</last></author>
    <author><first>Bonnie</first><last>Webber</last></author>
    <booktitle>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>33&#8211;42</pages>
    <url>http://www.aclweb.org/anthology/E17-4004</url>
    <abstract>Sense classification of discourse relations is a sub-task of shallow discourse
	parsing. Discourse relations can occur both across sentences
	(ėxtit{inter-sentential}) and within sentences (ėxtit{intra-sentential}),
	and more than one discourse relation can hold between the same units. Using a
	newly available corpus of discourse-annotated intra-sentential conjoined verb
	phrases,
	we demonstrate a sequential classification pipeline for their multi-label sense
	classification.
	We assess the importance of each feature used in the classification, the
	feature scope, and what is lost in moving
	from gold standard manual parses to the output of an off-the-shelf parser.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>pyatkin-webber:2017:EACLSRW17</bibkey>
  </paper>

  <paper id="4005">
    <title>Deception detection in Russian texts</title>
    <author><first>Olga</first><last>Litvinova</last></author>
    <author><first>Pavel</first><last>Seredin</last></author>
    <author><first>Tatiana</first><last>Litvinova</last></author>
    <author><first>John</first><last>Lyell</last></author>
    <booktitle>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>43&#8211;52</pages>
    <url>http://www.aclweb.org/anthology/E17-4005</url>
    <abstract>Humans are known to detect deception in speech randomly and it is therefore
	important to develop tools to enable them to detect deception. The problem of
	deception detection has been studied for a significant amount of time, however
	the last 10-15 years have seen methods of computational linguistics being
	employed. Texts are processed using different NLP tools and then classified as
	deceptive/truthful using machine learning methods. While most research has been
	performed for English, Slavic languages have never been a focus of detection
	deception studies. The paper deals with deception detection in Russian
	narratives. It employs a specially designed corpus of truthful and deceptive
	texts on the same topic from each respondent, N = 113. The texts were processed
	using Linguistic Inquiry and Word Count software that is used in most studies
	of text-based deception detection. The list of parameters computed using the
	software was expanded due to the designed users' dictionaries. A variety of
	text classification methods was employed. The accuracy of the model was found
	to depend on the author's gender and text type (deceptive/truthful).</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>litvinova-EtAl:2017:EACLSRW17</bibkey>
  </paper>

  <paper id="4006">
    <title>A Computational Model of Human Preferences for Pronoun Resolution</title>
    <author><first>Olga</first><last>Seminck</last></author>
    <author><first>Pascal</first><last>Amsili</last></author>
    <booktitle>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>53&#8211;63</pages>
    <url>http://www.aclweb.org/anthology/E17-4006</url>
    <abstract>We present a cognitive computational model of pronoun resolution that
	reproduces the human interpretation preferences of the Subject Assignment
	Strategy and the Parallel Function Strategy. Our model relies on a
	probabilistic pronoun resolution system trained on corpus data. Factors
	influencing pronoun resolution are represented as features weighted by their
	relative importance. The importance the model gives to the preferences is in
	line with psycholinguistic studies. We demonstrate the cognitive plausibility
	of the model by running it on experimental items and simulating antecedent
	choice and reading times of human participants. Our model can be used as a new
	means to study pronoun resolution, because it captures the interaction of
	preferences.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>seminck-amsili:2017:EACLSRW17</bibkey>
  </paper>

  <paper id="4007">
    <title>Automatic Extraction of News Values from Headline Text</title>
    <author><first>Alicja</first><last>Piotrkowicz</last></author>
    <author><first>Vania</first><last>Dimitrova</last></author>
    <author><first>Katja</first><last>Markert</last></author>
    <booktitle>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>64&#8211;74</pages>
    <url>http://www.aclweb.org/anthology/E17-4007</url>
    <abstract>Headlines play a crucial role in attracting audiences' attention to online
	artefacts (e.g. news articles, videos, blogs). The ability to carry out an
	automatic, large-scale analysis of headlines is critical to facilitate the
	selection and prioritisation of a large volume of digital content. In
	journalism studies news content has been extensively studied using manually
	annotated news values - factors used implicitly and explicitly when making
	decisions on the selection and prioritisation of news items. This paper
	presents the first attempt at a fully automatic extraction of news values from
	headline text. The news values extraction methods are applied on a large
	headlines corpus collected from The Guardian, and evaluated by comparing it
	with a manually annotated gold standard. A crowdsourcing survey indicates that
	news values affect people's decisions to click on a headline, supporting the
	need
	for an automatic news values detection.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>piotrkowicz-dimitrova-markert:2017:EACLSRW17</bibkey>
  </paper>

  <paper id="4008">
    <title>Assessing Convincingness of Arguments in Online Debates with Limited Number of Features</title>
    <author><first>Lisa Andreevna</first><last>Chalaguine</last></author>
    <author><first>Claudia</first><last>Schulz</last></author>
    <booktitle>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>75&#8211;83</pages>
    <url>http://www.aclweb.org/anthology/E17-4008</url>
    <abstract>We propose a new method in the field of argument analysis in social media to
	determining convincingness of arguments in online debates, following previous
	research by Habernal and Gurevych (2016). Rather than using argument specific
	feature values, we measure feature values relative to the average value in the
	debate, allowing us to determine argument convincingness with fewer features
	(between 5 and 35) than normally used for natural language processing tasks. We
	use a simple forward-feeding neural network for this task and achieve an
	accuracy of 0.77 which is comparable to the accuracy obtained using 64k
	features and a support vector machine by Habernal and Gurevych.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chalaguine-schulz:2017:EACLSRW17</bibkey>
  </paper>

  <paper id="4009">
    <title>Zipf's and Benford's laws in Twitter hashtags</title>
    <author><first>Jos&#233; Alberto</first><last>P&#233;rez-Meli&#225;n</last></author>
    <author><first>J. Alberto</first><last>Conejero</last></author>
    <author><first>Cesar</first><last>Ferri Ram&#237;rez</last></author>
    <booktitle>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>84&#8211;93</pages>
    <url>http://www.aclweb.org/anthology/E17-4009</url>
    <abstract>Social networks have transformed communication dramatically in recent years
	through the rise of new platforms and the development of a new language of
	communication. This landscape requires new forms to describe and predict the
	behaviour of users in networks.
	This paper presents an analysis of the frequency distribution of hashtag
	popularity in Twitter conversations. Our objective is to determine if these
	frequency distribution follow some well-known frequency distribution that many
	real-life sets of numerical data satisfy.
	In particular, we study the similarity of frequency distribution of hashtag
	popularity with respect to Zipf’s law, an empirical law referring to the
	phenomenon that many types of data in social sciences can be approximated with
	a Zipfian distribution.
	Additionally, we also analyse  Benford’s law, is a special case of Zipf's
	law, a common pattern about the frequency distribution of leading digits. In
	order to compute correctly the frequency distribution of hashtag popularity, we
	need to correct many spelling errors that Twitter's users introduce. For this
	purpose we introduce a new filter to correct hashtag mistake based on string
	distances. The experiments obtained employing datasets of Twitter streams
	generated under controlled conditions  show that  Benford’s law and Zipf's
	law can be used to model hashtag  frequency distribution.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>perezmelian-conejero-ferriramirez:2017:EACLSRW17</bibkey>
  </paper>

  <paper id="4010">
    <title>A Multi-aspect Analysis of Automatic Essay Scoring for Brazilian Portuguese</title>
    <author><first>Evelin</first><last>Amorim</last></author>
    <author><first>Adriano</first><last>Veloso</last></author>
    <booktitle>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>94&#8211;102</pages>
    <url>http://www.aclweb.org/anthology/E17-4010</url>
    <abstract>Several methods for automatic essay scoring (AES) for English language have
	been
	proposed. However, multi-aspect AES systems for other languages are unusual.
	Therefore, we propose a multi-aspect AES system to apply on a dataset of
	Brazilian
	Portuguese essays, which human experts evaluated according to five
	aspects defined by Brazilian Government to the National
	Exam to High School Student (ENEM). These aspects are skills that
	student must master and every skill is assessed apart from each other.
	Besides the
	prediction of each aspect, the feature analysis
	also was performed for each aspect. The AES system proposed
	employs several features already employed by AES systems for
	English language. Our results show that predictions for some aspects performed
	well with
	the features we employed, while predictions for other aspects performed poorly.
	Also, it is possible to note the difference between the five aspects
	in the detailed feature analysis we performed. Besides these contributions,
	the eight millions of enrollments every year for ENEM
	raise some challenge issues for future directions
	in our research.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>amorim-veloso:2017:EACLSRW17</bibkey>
  </paper>

  <paper id="4011">
    <title>Literal or idiomatic? Identifying the reading of single occurrences of German multiword expressions using word embeddings</title>
    <author><first>Rafael</first><last>Ehren</last></author>
    <booktitle>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>103&#8211;112</pages>
    <url>http://www.aclweb.org/anthology/E17-4011</url>
    <abstract>Non-compositional multiword expressions (MWEs) still pose serious issues for a
	variety of natural language processing tasks and their ubiquity makes it
	impossible to get around methods which automatically identify these kind of
	MWEs. The method presented in this paper was inspired by Sporleder and Li
	and is able to discriminate between the
	literal and non-literal use of an MWE in an unsupervised way. It is based on
	the assumption that words in a text form cohesive units. If the cohesion of
	these units is weakened by an expression, it is classified as literal, and
	otherwise as idiomatic. While Sporleder an Li used ėxtit{Normalized Google
	Distance} to modell semantic similarity, the present work examines the use of a
	variety of different word embeddings.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ehren:2017:EACLSRW17</bibkey>
  </paper>

  <paper id="4012">
    <title>Evaluating the Reliability and Interaction of Recursively Used Feature Classes for Terminology Extraction</title>
    <author><first>Anna</first><last>H&#228;tty</last></author>
    <author><first>Michael</first><last>Dorna</last></author>
    <author><first>Sabine</first><last>Schulte im Walde</last></author>
    <booktitle>Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>113&#8211;121</pages>
    <url>http://www.aclweb.org/anthology/E17-4012</url>
    <abstract>Feature design and selection is a crucial
	aspect when treating terminology extraction
	as a machine learning classification
	problem. We designed feature classes
	which characterize different properties of
	terms based on distributions, and propose
	a new feature class for components of term
	candidates. By using random forests, we
	infer optimal features which are later used
	to build decision tree classifiers. We evaluate
	our method using the ACL RD-TEC
	dataset. We demonstrate the importance
	of the novel feature class for downgrading
	termhood which exploits properties of
	term components. Furthermore, our classification
	suggests that the identification
	of reliable term candidates should be performed
	successively, rather than just once.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hatty-dorna-schulteimwalde:2017:EACLSRW17</bibkey>
  </paper>

</volume>

