<?xml version="1.0" encoding="UTF-8" ?>
<volume id="E17">
  <paper id="1000">
    <title>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</title>
    <editor>Mirella Lapata</editor>
    <editor>Phil Blunsom</editor>
    <editor>Alexander Koller</editor>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/E17-1</url>
    <bibtype>book</bibtype>
    <bibkey>EACLlong:2017</bibkey>
  </paper>

  <paper id="1001">
    <title>Gated End-to-End Memory Networks</title>
    <author><first>Fei</first><last>Liu</last></author>
    <author><first>Julien</first><last>Perez</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;10</pages>
    <url>http://www.aclweb.org/anthology/E17-1001</url>
    <abstract>Machine reading using differentiable reasoning models has recently shown
	remarkable progress. In this context, End-to-End trainable Memory Networks
	(MemN2N) have demonstrated promising performance on simple natural language
	based reasoning tasks such as factual reasoning and basic deduction. However,
	other tasks, namely multi-fact question-answering, positional reasoning or
	dialog related tasks, remain challenging particularly due to the necessity of
	more complex interactions between the memory and controller modules composing
	this family of models. In this paper, we introduce a novel end-to-end memory
	access regulation mechanism inspired by the current progress on the connection
	short-cutting principle in the field of computer vision. Concretely, we develop
	a Gated End-to-End trainable Memory Network architecture (GMemN2N). From the
	machine learning perspective, this new capability is learned in an end-to-end
	fashion without the use of any additional supervision signal which is, as far
	as our knowledge goes, the first of its kind. Our experiments show significant
	improvements on the most challenging tasks in the 20 bAbI dataset, without the
	use of any domain knowledge. Then, we show improvements on the Dialog bAbI
	tasks including the real human-bot conversion-based Dialog State Tracking
	Challenge (DSTC-2) dataset. On these two datasets, our model sets the new state
	of the art.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>liu-perez:2017:EACLlong</bibkey>
  </paper>

  <paper id="1002">
    <title>Neural Tree Indexers for Text Understanding</title>
    <author><first>Tsendsuren</first><last>Munkhdalai</last></author>
    <author><first>Hong</first><last>Yu</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>11&#8211;21</pages>
    <url>http://www.aclweb.org/anthology/E17-1002</url>
    <abstract>Recurrent neural networks (RNNs) process
	input text sequentially and model the
	conditional transition between word tokens.
	In contrast, the advantages of recursive
	networks include that they explicitly
	model the compositionality and the recursive
	structure of natural language. However,
	the current recursive architecture is
	limited by its dependence on syntactic
	tree. In this paper, we introduce a robust
	syntactic parsing-independent tree structured
	model, Neural Tree Indexers (NTI)
	that provides a middle ground between the
	sequential RNNs and the syntactic treebased
	recursive models. NTI constructs a
	full n-ary tree by processing the input text
	with its node function in a bottom-up fashion.
	Attention mechanism can then be applied
	to both structure and node function.
	We implemented and evaluated a binary tree
	model of NTI, showing the model
	achieved the state-of-the-art performance
	on three different NLP tasks: natural language
	inference, answer sentence selection,
	and sentence classification, outperforming
	state-of-the-art recurrent and recursive
	neural networks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>munkhdalai-yu:2017:EACLlong1</bibkey>
  </paper>

  <paper id="1003">
    <title>Exploring Different Dimensions of Attention for Uncertainty Detection</title>
    <author><first>Heike</first><last>Adel</last></author>
    <author><first>Hinrich</first><last>Sch&#252;tze</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>22&#8211;34</pages>
    <url>http://www.aclweb.org/anthology/E17-1003</url>
    <abstract>Neural networks with attention have proven effective for many natural language
	processing tasks. In this paper, we develop attention mechanisms for
	uncertainty detection. In particular, we generalize standardly used attention
	mechanisms by introducing external attention and sequence-preserving attention.
	These novel architectures differ from standard approaches in that they use
	external resources to compute attention weights and preserve sequence
	information. We compare them to other configurations along different dimensions
	of attention. Our novel architectures set the new state of the art on a
	Wikipedia benchmark dataset and perform similar to the state-of-the-art model
	on a biomedical benchmark which uses a large set of linguistic features.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>adel-schutze:2017:EACLlong</bibkey>
  </paper>

  <paper id="1004">
    <title>Classifying Illegal Activities on Tor Network Based on Web Textual Contents</title>
    <author><first>Mhd Wesam</first><last>Al Nabki</last></author>
    <author><first>Eduardo</first><last>Fidalgo</last></author>
    <author><first>Enrique</first><last>Alegre</last></author>
    <author><first>Ivan</first><last>de Paz</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>35&#8211;43</pages>
    <url>http://www.aclweb.org/anthology/E17-1004</url>
    <abstract>The freedom of the Deep Web offers a safe place where people can express
	themselves anonymously but they also can conduct illegal activities. In this
	paper, we present and make publicly available a new dataset for Darknet active
	domains, which we call &#x201d;Darknet Usage Text Addresses&#x201d; (DUTA). We built
	DUTA by sampling the Tor network during two months and manually labeled each
	address into 26 classes. Using DUTA, we conducted a comparison between two
	well-known text representation techniques crossed by three different supervised
	classifiers to categorize the Tor hidden services. We also fixed the pipeline
	elements and identified the aspects that have a critical influence on the
	classification results. We found that the combination of TFIDF words
	representation with Logistic Regression classifier achieves 96.6% of 10 folds
	cross-validation accuracy and a macro F1 score of 93.7% when classifying a
	subset of illegal activities from DUTA. The good performance of the classifier
	might support potential tools to help the authorities in the detection of these
	activities.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>alnabki-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1005">
    <title>When is multitask learning effective? Semantic sequence prediction under varying data conditions</title>
    <author><first>H&#233;ctor</first><last>Mart&#237;nez Alonso</last></author>
    <author><first>Barbara</first><last>Plank</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>44&#8211;53</pages>
    <url>http://www.aclweb.org/anthology/E17-1005</url>
    <abstract>Multitask learning has been applied successfully to a range of tasks, mostly
	morphosyntactic. However, little is known on ėxtitwhen MTL works and
	whether there are data characteristics that help to determine the success of
	MTL. In this paper we evaluate a range of semantic sequence labeling tasks in a
	MTL setup. We examine different auxiliary task configurations, amongst which a
	novel setup, and correlate their impact to data-dependent conditions. Our
	results show that MTL is not always effective, because significant improvements
	are obtained only for 1 out of 5 tasks. When successful,  
	auxiliary tasks with compact and more uniform label distributions are
	preferable.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>martinezalonso-plank:2017:EACLlong</bibkey>
  </paper>

  <paper id="1006">
    <title>Learning Compositionality Functions on Word Embeddings for Modelling Attribute Meaning in Adjective-Noun Phrases</title>
    <author><first>Matthias</first><last>Hartung</last></author>
    <author><first>Fabian</first><last>Kaupmann</last></author>
    <author><first>Soufian</first><last>Jebbara</last></author>
    <author><first>Philipp</first><last>Cimiano</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>54&#8211;64</pages>
    <url>http://www.aclweb.org/anthology/E17-1006</url>
    <abstract>Word embeddings have been shown to be highly effective in a variety of lexical
	semantic tasks. They tend to capture meaningful relational similarities between
	individual words, at the expense of lacking the capabilty of making the
	underlying semantic relation explicit. In this paper, we investigate the
	attribute relation that often holds between the constituents of adjective-noun
	phrases. We use CBOW word embeddings to represent word meaning and learn a
	compositionality function that combines the individual constituents into a
	phrase representation, thus capturing the compositional attribute meaning. The
	resulting embedding model, while being fully interpretable, outperforms
	count-based distributional vector space models that are tailored to attribute
	meaning in the two tasks of attribute selection and phrase similarity
	prediction. Moreover, as the model captures a generalized layer of attribute
	meaning, it bears the potential to be used for predictions over various
	attribute inventories without re-training.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hartung-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1007">
    <title>Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection</title>
    <author><first>Vered</first><last>Shwartz</last></author>
    <author><first>Enrico</first><last>Santus</last></author>
    <author><first>Dominik</first><last>Schlechtweg</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>65&#8211;75</pages>
    <url>http://www.aclweb.org/anthology/E17-1007</url>
    <abstract>The fundamental role of hypernymy in NLP has motivated the development of many
	methods for the automatic identification of this relation, most of which rely
	on word distribution. 
	We investigate an extensive number of such unsupervised measures, using several
	distributional semantic models that differ by context type and feature
	weighting. We analyze the performance of the different methods based on their
	linguistic motivation.
	Comparison to the state-of-the-art supervised methods shows that while
	supervised methods generally outperform the unsupervised ones, the former are
	sensitive to the distribution of training instances, hurting their reliability.
	Being based on general linguistic hypotheses and independent from training
	data, unsupervised measures are more robust, and therefore are still useful
	artillery for hypernymy detection.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shwartz-santus-schlechtweg:2017:EACLlong</bibkey>
  </paper>

  <paper id="1008">
    <title>Distinguishing Antonyms and Synonyms in a Pattern-based Neural Network</title>
    <author><first>Kim Anh</first><last>Nguyen</last></author>
    <author><first>Sabine</first><last>Schulte im Walde</last></author>
    <author><first>Ngoc Thang</first><last>Vu</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>76&#8211;85</pages>
    <url>http://www.aclweb.org/anthology/E17-1008</url>
    <abstract>Distinguishing between antonyms and synonyms is a key task to achieve high
	performance in NLP systems. While they are notoriously difficult to distinguish
	by distributional co-occurrence models, pattern-based methods have proven
	effective to differentiate between the relations. In this paper, we present a
	novel neural network model AntSynNET that exploits lexico-syntactic patterns
	from syntactic parse trees. In addition to the lexical and syntactic
	information, we successfully integrate the distance between the related words
	along the syntactic path as a new pattern feature. The results from
	classification experiments show that AntSynNET improves the performance over
	prior pattern-based methods.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nguyen-schulteimwalde-vu:2017:EACLlong</bibkey>
  </paper>

  <paper id="1009">
    <title>Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation</title>
    <author><first>Alexander</first><last>Panchenko</last></author>
    <author><first>Eugen</first><last>Ruppert</last></author>
    <author><first>Stefano</first><last>Faralli</last></author>
    <author><first>Simone Paolo</first><last>Ponzetto</last></author>
    <author><first>Chris</first><last>Biemann</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>86&#8211;98</pages>
    <url>http://www.aclweb.org/anthology/E17-1009</url>
    <abstract>The current trend in NLP is the use of highly opaque models, e.g. neural
	networks and word embeddings. While these models yield state-of-the-art results
	on a range of tasks, their drawback is poor interpretability. On the example of
	word sense induction and disambiguation (WSID), we show that it is possible to
	develop an interpretable model that matches the state-of-the-art models in
	accuracy. Namely, we present an unsupervised, knowledge-free WSID approach,
	which is interpretable at three levels: word sense inventory, sense feature
	representations, and disambiguation procedure. Experiments show that our model
	performs on par with state-of-the-art word sense embeddings and other
	unsupervised systems while offering the possibility to justify its decisions in
	human-readable form.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>panchenko-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1010">
    <title>Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison</title>
    <author><first>Alessandro</first><last>Raganato</last></author>
    <author><first>Jose</first><last>Camacho-Collados</last></author>
    <author><first>Roberto</first><last>Navigli</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>99&#8211;110</pages>
    <url>http://www.aclweb.org/anthology/E17-1010</url>
    <abstract>Word Sense Disambiguation is a long-standing task in Natural Language
	Processing, lying at the core of human language understanding. However, the
	evaluation of automatic systems has been problematic, mainly due to the lack of
	a reliable evaluation framework. In this paper we develop a unified evaluation
	framework and analyze the performance of various Word Sense Disambiguation
	systems in a fair setup. The results show that supervised systems clearly
	outperform knowledge-based models. Among the supervised systems, a linear
	classifier trained on conventional local features still proves to be a hard
	baseline to beat. Nonetheless, recent approaches exploiting neural networks on
	unlabeled corpora achieve promising results, surpassing this hard baseline in
	most test sets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>raganato-camachocollados-navigli:2017:EACLlong</bibkey>
  </paper>

  <paper id="1011">
    <title>Which is the Effective Way for Gaokao: Information Retrieval or Neural Networks?</title>
    <author><first>Shangmin</first><last>Guo</last></author>
    <author><first>Xiangrong</first><last>Zeng</last></author>
    <author><first>Shizhu</first><last>He</last></author>
    <author><first>Kang</first><last>Liu</last></author>
    <author><first>Jun</first><last>Zhao</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>111&#8211;120</pages>
    <url>http://www.aclweb.org/anthology/E17-1011</url>
    <abstract>As one of the most important test of China, Gaokao is designed to be difficult
	enough to distinguish the excellent high school students. In this work, we
	detailed the Gaokao History Multiple Choice Questions(GKHMC) and proposed two
	different approaches to address them using various resources. One approach is
	based on entity search technique (IR approach), the other is based on text
	entailment approach where we specifically employ deep neural networks(NN
	approach). The result of experiment on our collected real Gaokao questions
	showed that they are good at different categories of questions, that is IR
	approach performs much better at entity questions(EQs) while NN approach shows
	its advantage on sentence questions(SQs). We achieve state-of-the-art
	performance and show that it's indispensable to apply hybrid method when
	participating in the real-world tests.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>guo-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1012">
    <title>If You Can't Beat Them Join Them: Handcrafted Features Complement Neural Nets for Non-Factoid Answer Reranking</title>
    <author><first>Dasha</first><last>Bogdanova</last></author>
    <author><first>Jennifer</first><last>Foster</last></author>
    <author><first>Daria</first><last>Dzendzik</last></author>
    <author><first>Qun</first><last>Liu</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>121&#8211;131</pages>
    <url>http://www.aclweb.org/anthology/E17-1012</url>
    <abstract>We show that a neural approach to the task of non-factoid answer reranking can
	benefit from the inclusion of tried-and-tested handcrafted features. 
	We present a neural network architecture based on a combination of recurrent
	neural networks that are used to encode questions and answers, and a multilayer
	perceptron. We show how this approach can be combined with additional features,
	in particular, the discourse features used by previous research. Our neural
	approach achieves state-of-the-art performance on a public dataset from
	Yahoo!~Answers and its performance is further improved by incorporating the
	discourse features. Additionally, we present a new dataset of Ask Ubuntu
	questions where the hybrid approach also achieves good results.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bogdanova-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1013">
    <title>Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks</title>
    <author><first>Rajarshi</first><last>Das</last></author>
    <author><first>Arvind</first><last>Neelakantan</last></author>
    <author><first>David</first><last>Belanger</last></author>
    <author><first>Andrew</first><last>McCallum</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>132&#8211;141</pages>
    <url>http://www.aclweb.org/anthology/E17-1013</url>
    <abstract>Our goal is to combine the rich multi-step inference of symbolic logical
	reasoning with the generalization capabilities of neural networks.  We are
	particularly interested in complex reasoning about entities and relations in
	text and large-scale knowledge bases (KBs). ėwcite{neelakantan15} use RNNs to
	compose the distributed semantics of multi-hop paths in KBs; however for
	multiple reasons, the approach lacks accuracy and practicality. This paper
	proposes three significant modeling advances: (1) we learn to jointly reason
	about relations, entities, and entity-types; (2) we use neural attention
	modeling to incorporate multiple paths; (3) we learn to share
	strength in a single RNN that represents logical composition across all
	relations. On a large-scale Freebase+ClueWeb prediction task, we achieve 25%
	error reduction, and a 53% error reduction on sparse relations due to shared
	strength. On chains of reasoning in WordNet we reduce error in mean quantile by
	84% versus previous state-of-the-art.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>das-EtAl:2017:EACLlong1</bibkey>
  </paper>

  <paper id="1014">
    <title>Recognizing Mentions of Adverse Drug Reaction in Social Media Using Knowledge-Infused Recurrent Models</title>
    <author><first>Gabriel</first><last>Stanovsky</last></author>
    <author><first>Daniel</first><last>Gruhl</last></author>
    <author><first>Pablo</first><last>Mendes</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>142&#8211;151</pages>
    <url>http://www.aclweb.org/anthology/E17-1014</url>
    <abstract>Recognizing mentions of Adverse Drug Reactions (ADR) in social media is
	challenging: ADR mentions are context-dependent and include long, varied and
	unconventional descriptions as compared to more formal medical symptom
	terminology. We use the CADEC corpus to train a recurrent neural network (RNN)
	transducer, integrated with knowledge graph embeddings of DBpedia, and show the
	resulting model to be highly accurate (93.4 F1). 
	Furthermore, even when lacking high quality expert annotations, we show that by
	employing an active learning technique and using purpose built annotation
	tools, we can train the RNN to perform well (83.9 F1).</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>stanovsky-gruhl-mendes:2017:EACLlong</bibkey>
  </paper>

  <paper id="1015">
    <title>Multitask Learning for Mental Health Conditions with Limited Social Media Data</title>
    <author><first>Adrian</first><last>Benton</last></author>
    <author><first>Margaret</first><last>Mitchell</last></author>
    <author><first>Dirk</first><last>Hovy</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>152&#8211;162</pages>
    <url>http://www.aclweb.org/anthology/E17-1015</url>
    <revision id='2'>E17-1015v2</revision>
    <abstract>Language contains information about the author's demographic attributes as
	well as their mental state, and has been successfully leveraged in NLP to
	predict either one alone. However, demographic attributes and mental states
	also interact with each other, and we are the first to demonstrate how to use
	them jointly to improve the prediction of mental health conditions across the
	board. We model the different conditions as tasks in a multitask learning (MTL)
	framework, and establish for the first time the potential of deep learning in
	the prediction of mental health from online user-generated text. The framework
	we propose significantly improves over all baselines and single-task models for
	predicting mental health conditions, with particularly significant gains for
	conditions with limited data. In addition, our best MTL model can predict the
	presence of conditions (neuroatypicality) more generally, further reducing the
	error of the strong feed-forward baseline.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>benton-mitchell-hovy:2017:EACLlong</bibkey>
  </paper>

  <paper id="1016">
    <title>Evaluation by Association: A Systematic Study of Quantitative Word Association Evaluation</title>
    <author><first>Ivan</first><last>Vuli&#x107;</last></author>
    <author><first>Douwe</first><last>Kiela</last></author>
    <author><first>Anna</first><last>Korhonen</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>163&#8211;175</pages>
    <url>http://www.aclweb.org/anthology/E17-1016</url>
    <abstract>Recent work on evaluating representation learning architectures in NLP has
	established a need for evaluation protocols based on subconscious cognitive
	measures rather than manually tailored intrinsic similarity and relatedness
	tasks. In this work, we propose a novel evaluation framework that enables
	large-scale evaluation of such architectures in the free word association (WA)
	task, which is firmly grounded in cognitive theories of human semantic
	representation. This evaluation is facilitated by the existence of large
	manually constructed repositories of word association data. In this paper, we
	(1) present a detailed analysis of the new quantitative WA evaluation protocol,
	(2) suggest new evaluation metrics for the WA task inspired by its direct
	analogy with information retrieval problems, (3) evaluate various
	state-of-the-art representation models on this task, and (4) discuss the
	relationship between WA and prior evaluations of semantic representation with
	well-known similarity and relatedness evaluation sets. We have made the WA
	evaluation toolkit publicly available.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>vulic-kiela-korhonen:2017:EACLlong</bibkey>
  </paper>

  <paper id="1017">
    <title>Computational Argumentation Quality Assessment in Natural Language</title>
    <author><first>Henning</first><last>Wachsmuth</last></author>
    <author><first>Nona</first><last>Naderi</last></author>
    <author><first>Yufang</first><last>Hou</last></author>
    <author><first>Yonatan</first><last>Bilu</last></author>
    <author><first>Vinodkumar</first><last>Prabhakaran</last></author>
    <author><first>Tim Alberdingk</first><last>Thijm</last></author>
    <author><first>Graeme</first><last>Hirst</last></author>
    <author><first>Benno</first><last>Stein</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>176&#8211;187</pages>
    <url>http://www.aclweb.org/anthology/E17-1017</url>
    <abstract>Research on computational argumentation faces the problem of how to
	automatically assess the quality of an argument or argumentation. While
	different quality dimensions have been approached in natural language
	processing, a common understanding of argumentation quality is still missing.
	This paper presents the first holistic work on computational argumentation
	quality in natural language. We comprehensively survey the diverse existing
	theories and approaches to assess logical, rhetorical, and dialectical quality
	dimensions, and we derive a systematic taxonomy from these. In addition, we
	provide a corpus with 320 arguments, annotated for all 15 dimensions in the
	taxonomy. Our results establish a common ground for research on computational
	argumentation quality assessment.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wachsmuth-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1018">
    <title>A method for in-depth comparative evaluation: How (dis)similar are outputs of pos taggers, dependency parsers and coreference resolvers really?</title>
    <author><first>Don</first><last>Tuggener</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>188&#8211;198</pages>
    <url>http://www.aclweb.org/anthology/E17-1018</url>
    <abstract>This paper proposes a generic method for the comparative evaluation of system
	outputs.  The approach is able to quantify the pairwise differences between two
	outputs and  to  unravel  in  detail  what  the  differences consist of. We
	apply our approach to three tasks in Computational Linguistics, i.e. POS
	tagging, dependency parsing, and coreference resolution.  We find that system
	outputs are more distinct than the (often) small differences in evaluation
	scores seem to suggest.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tuggener:2017:EACLlong</bibkey>
  </paper>

  <paper id="1019">
    <title>Re-evaluating Automatic Metrics for Image Captioning</title>
    <author><first>Mert</first><last>Kilickaya</last></author>
    <author><first>Aykut</first><last>Erdem</last></author>
    <author><first>Nazli</first><last>Ikizler-Cinbis</last></author>
    <author><first>Erkut</first><last>Erdem</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>199&#8211;209</pages>
    <url>http://www.aclweb.org/anthology/E17-1019</url>
    <abstract>The task of generating natural language descriptions from images has received a
	lot of attention in recent years. Consequently, it is becoming increasingly
	important to evaluate such image captioning approaches in an automatic manner.
	In this paper, we provide an in-depth evaluation of the existing image
	captioning metrics through a series of carefully designed experiments.
	Moreover, we explore the utilization of the recently proposed Word Mover's
	Distance (WMD) document metric for the purpose of image captioning. Our
	findings outline the differences and/or similarities between metrics and their
	relative robustness by means of extensive correlation, accuracy and distraction
	based evaluations. Our results also demonstrate that WMD provides strong
	advantages over other metrics.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kilickaya-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1020">
    <title>Integrating Meaning into Quality Evaluation of Machine Translation</title>
    <author><first>Osman</first><last>Baskaya</last></author>
    <author><first>Eray</first><last>Yildiz</last></author>
    <author><first>Doruk</first><last>Tunaoglu</last></author>
    <author><first>Mustafa Tolga</first><last>Eren</last></author>
    <author><first>A. Seza</first><last>Do&#287;ru&#246;z</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>210&#8211;219</pages>
    <url>http://www.aclweb.org/anthology/E17-1020</url>
    <abstract>Machine translation (MT) quality is evaluated through comparisons between MT
	outputs and the human translations (HT). Traditionally, this evaluation relies
	on form related features (e.g. lexicon and syntax) and ignores the transfer of
	meaning reflected in HT outputs. Instead, we evaluate the quality of MT outputs
	through meaning related features (e.g. polarity, subjectivity) with two
	experiments. In the first experiment, the meaning related features are compared
	to human rankings individually. In the second experiment, combinations of
	meaning related features and other quality metrics are utilized to predict the
	same human rankings. The results of our experiments confirm the benefit of
	these features in predicting human evaluation of translation quality in
	addition to traditional metrics which focus mainly on form.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>baskaya-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1021">
    <title>Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages</title>
    <author><first>Michael</first><last>Schlichtkrull</last></author>
    <author><first>Anders</first><last>S&#248;gaard</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>220&#8211;229</pages>
    <url>http://www.aclweb.org/anthology/E17-1021</url>
    <abstract>In cross-lingual dependency annotation projection, information is often lost
	during transfer because of early decoding. We present an end-to-end graph-based
	neural network dependency parser that can be trained to reproduce matrices of
	edge scores, which can be directly projected across word alignments. We show
	that our approach to cross-lingual dependency parsing is not only simpler, but
	also achieves an absolute improvement of 2.25% averaged across 10 languages
	compared to the previous state of the art.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>schlichtkrull-sogaard:2017:EACLlong</bibkey>
  </paper>

  <paper id="1022">
    <title>Parsing Universal Dependencies without training</title>
    <author><first>H&#233;ctor</first><last>Mart&#237;nez Alonso</last></author>
    <author><first>&#x17D;eljko</first><last>Agi&#x107;</last></author>
    <author><first>Barbara</first><last>Plank</last></author>
    <author><first>Anders</first><last>S&#248;gaard</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>230&#8211;240</pages>
    <url>http://www.aclweb.org/anthology/E17-1022</url>
    <abstract>We present UDP, the first training-free parser for Universal Dependencies (UD).
	Our algorithm is based on  PageRank and a small set of specific dependency head
	rules. UDP features two-step decoding to guarantee that function words are
	attached as leaf nodes. The parser requires no training, and it is competitive
	with a delexicalized transfer system. UDP offers a linguistically sound
	unsupervised alternative to cross-lingual parsing for UD. The parser has very
	few parameters and distinctly robust to domain change across languages.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>martinezalonso-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1023">
    <title>Delexicalized Word Embeddings for Cross-lingual Dependency Parsing</title>
    <author><first>Mathieu</first><last>Dehouck</last></author>
    <author><first>Pascal</first><last>Denis</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>241&#8211;250</pages>
    <url>http://www.aclweb.org/anthology/E17-1023</url>
    <abstract>This paper presents a new approach to
	the problem of cross-lingual dependency
	parsing, aiming at leveraging training data
	from different source languages to learn
	a parser in a target language. Specifi-
	cally, this approach first constructs word
	vector representations that exploit struc-
	tural (i.e., dependency-based) contexts but
	only considering the morpho-syntactic in-
	formation associated with each word and
	its contexts. These delexicalized word em-
	beddings, which can be trained on any set
	of languages and capture features shared
	across languages, are then used in com-
	bination with standard language-specific
	features to train a lexicalized parser in the
	target language. We evaluate our approach
	through experiments on a set of eight dif-
	ferent languages that are part the Univer-
	sal Dependencies Project. Our main re-
	sults show that using such delexicalized
	embeddings, either trained in a monolin-
	gual or multilingual fashion, achieves sig-
	nificant improvements over monolingual
	baselines.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dehouck-denis:2017:EACLlong</bibkey>
  </paper>

  <paper id="1024">
    <title>Stance Classification of Context-Dependent Claims</title>
    <author><first>Roy</first><last>Bar-Haim</last></author>
    <author><first>Indrajit</first><last>Bhattacharya</last></author>
    <author><first>Francesco</first><last>Dinuzzo</last></author>
    <author><first>Amrita</first><last>Saha</last></author>
    <author><first>Noam</first><last>Slonim</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>251&#8211;261</pages>
    <url>http://www.aclweb.org/anthology/E17-1024</url>
    <abstract>Recent work has addressed the problem of detecting relevant claims for a given
	controversial topic. We introduce the complementary task of Claim Stance
	Classification, along with the first benchmark dataset for this task. We
	decompose this problem into: (a) open-domain target identification for topic
	and claim (b) sentiment classification for each target, and (c) open-domain
	contrast detection between the topic and the claim targets. Manual annotation
	of the dataset confirms the applicability and validity of our model. We
	describe an implementation of our model, focusing on a novel algorithm for
	contrast detection. Our approach achieves promising results, and is shown to
	outperform several baselines, which represent the common practice of applying a
	single, monolithic classifier for stance classification.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>barhaim-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1025">
    <title>Exploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study</title>
    <author><first>Jihen</first><last>Karoui</last></author>
    <author><first>Benamara</first><last>Farah</last></author>
    <author><first>V&#233;ronique</first><last>Moriceau</last></author>
    <author><first>Viviana</first><last>Patti</last></author>
    <author><first>Cristina</first><last>Bosco</last></author>
    <author><first>Nathalie</first><last>Aussenac-Gilles</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>262&#8211;272</pages>
    <url>http://www.aclweb.org/anthology/E17-1025</url>
    <abstract>This paper provides a linguistic and pragmatic analysis of the phenomenon of
	irony in order to represent how Twitter's users exploit irony devices within
	their communication strategies for generating textual contents. We aim to
	measure the impact of a wide-range of pragmatic phenomena in the interpretation
	of irony, and to investigate how these phenomena interact with contexts local
	to the tweet. 
	Informed by linguistic theories, we propose for the first time a multi-layered
	annotation schema for irony and its application to a corpus of French, English
	and Italian tweets. We detail each layer, explore their interactions, and
	discuss our results according to a qualitative and quantitative perspective.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>karoui-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1026">
    <title>A Multi-View Sentiment Corpus</title>
    <author><first>Debora</first><last>Nozza</last></author>
    <author><first>Elisabetta</first><last>Fersini</last></author>
    <author><first>Enza</first><last>Messina</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>273&#8211;280</pages>
    <url>http://www.aclweb.org/anthology/E17-1026</url>
    <abstract>Sentiment Analysis is a broad task that involves the analysis of various aspect
	of the natural language text. However, most of the approaches in the state of
	the art usually  investigate independently each aspect, i.e. Subjectivity
	Classification, Sentiment Polarity Classification, Emotion Recognition, Irony
	Detection. In this paper we present a Multi-View Sentiment Corpus (MVSC), which
	comprises 3000 English microblog posts related the movie domain. Three
	independent annotators manually labelled MVSC, following a broad annotation
	schema about different aspects that can be grasped from natural language text
	coming from social networks. The contribution is therefore a corpus that
	comprises five different views for each message, i.e. subjective/objective,
	sentiment polarity, implicit/explicit, irony, emotion.
	In order to allow a more detailed investigation on the human labelling
	behaviour, we provide the annotations of each  human annotator involved.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nozza-fersini-messina:2017:EACLlong</bibkey>
  </paper>

  <paper id="1027">
    <title>A Systematic Study of Neural Discourse Models for Implicit Discourse Relation</title>
    <author><first>Attapol</first><last>Rutherford</last></author>
    <author><first>Vera</first><last>Demberg</last></author>
    <author><first>Nianwen</first><last>Xue</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>281&#8211;291</pages>
    <url>http://www.aclweb.org/anthology/E17-1027</url>
    <abstract>Inferring implicit discourse relations in natural language text is the most
	difficult subtask in discourse parsing. Many neural network models have been
	proposed to tackle this problem. However, the comparison for this task is not
	unified, so we could hardly draw clear conclusions about the effectiveness of
	various architectures. Here, we propose neural network models that are based on
	feedforward and long-short term memory architecture and systematically study
	the effects of varying structures. To our surprise, the best-configured
	feedforward architecture outperforms LSTM-based model in most cases despite
	thorough tuning. Further, we compare our best feedforward system with
	competitive convolutional and recurrent networks and find that feedforward can
	actually be more effective. For the first time for this task, we compile and
	publish outputs from previous neural and non-neural systems to establish the
	standard for further comparison.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rutherford-demberg-xue:2017:EACLlong</bibkey>
  </paper>

  <paper id="1028">
    <title>Cross-lingual RST Discourse Parsing</title>
    <author><first>Chlo&#233;</first><last>Braud</last></author>
    <author><first>Maximin</first><last>Coavoux</last></author>
    <author><first>Anders</first><last>S&#248;gaard</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>292&#8211;304</pages>
    <url>http://www.aclweb.org/anthology/E17-1028</url>
    <abstract>Discourse parsing is an integral part of understanding information flow and
	argumentative structure in documents. Most previous research has focused on
	inducing and evaluating models from the English RST Discourse Treebank.
	However, discourse treebanks for other languages exist, including Spanish,
	German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same
	underlying linguistic theory, but differ slightly in the way documents are
	annotated. In this paper, we present (a) a new discourse parser which is
	simpler, yet competitive (significantly better on 2/3 metrics) to state of the
	art for English, (b) a harmonization of discourse treebanks across languages,
	enabling us to present (c) what to the best of our knowledge are the first
	experiments on cross-lingual discourse parsing.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>braud-coavoux-sogaard:2017:EACLlong</bibkey>
  </paper>

  <paper id="1029">
    <title>Dialog state tracking, a machine reading approach using Memory Network</title>
    <author><first>Julien</first><last>Perez</last></author>
    <author><first>Fei</first><last>Liu</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>305&#8211;314</pages>
    <url>http://www.aclweb.org/anthology/E17-1029</url>
    <abstract>In an end-to-end dialog system, the aim of dialog state tracking is to
	accurately estimate a compact representation of the current dialog status from
	a sequence of noisy observations produced by the speech recognition and the
	natural language understanding modules. This paper introduces a novel method of
	dialog state tracking based on the general paradigm of machine reading and
	proposes to solve it using an End-to-End Memory Network, MemN2N, a
	memory-enhanced neural network architecture. We evaluate the proposed approach
	on the second Dialog State Tracking Challenge (DSTC-2) dataset. The corpus has
	been converted for the occasion in order to frame the hidden state variable
	inference as a question-answering task based on a sequence of utterances
	extracted from a dialog. We show that the proposed tracker gives encouraging
	results. Then, we propose to extend the DSTC-2 dataset with specific reasoning
	capabilities requirement like counting, list maintenance, yes-no question
	answering and indefinite knowledge management. Finally, we present encouraging
	results using our proposed MemN2N based tracking model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>perez-liu:2017:EACLlong</bibkey>
  </paper>

  <paper id="1030">
    <title>Sentence Segmentation in Narrative Transcripts from Neuropsychological Tests using Recurrent Convolutional Neural Networks</title>
    <author><first>Marcos</first><last>Treviso</last></author>
    <author><first>Christopher</first><last>Shulby</last></author>
    <author><first>Sandra</first><last>Alu&#237;sio</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>315&#8211;325</pages>
    <url>http://www.aclweb.org/anthology/E17-1030</url>
    <abstract>Automated discourse analysis tools based on Natural Language Processing (NLP)
	aiming at the diagnosis of language-impairing dementias generally extract
	several textual metrics of narrative transcripts. However, the absence of
	sentence boundary segmentation in the transcripts prevents the direct
	application of NLP methods which rely on these marks in order to function
	properly, such as taggers and parsers. We present the first steps taken towards
	automatic neuropsychological evaluation based on narrative discourse analysis,
	presenting a new automatic sentence segmentation method for impaired speech.
	Our model uses recurrent convolutional neural networks with prosodic, Part of
	Speech (PoS) features, and word embeddings. It was evaluated intrinsically on
	impaired, spontaneous speech as well as normal, prepared speech and presents
	better results for healthy elderly (CTL) (F1 = 0.74) and Mild Cognitive
	Impairment (MCI) patients (F1 = 0.70) than the Conditional Random Fields method
	(F1 = 0.55 and 0.53, respectively) used in the same context of our study. The
	results suggest that our model is robust for impaired speech and can be used in
	automated discourse analysis tools to differentiate narratives produced by MCI
	and CTL.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>treviso-shulby-aluisio:2017:EACLlong</bibkey>
  </paper>

  <paper id="1031">
    <title>Joint, Incremental Disfluency Detection and Utterance Segmentation from Speech</title>
    <author><first>Julian</first><last>Hough</last></author>
    <author><first>David</first><last>Schlangen</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>326&#8211;336</pages>
    <url>http://www.aclweb.org/anthology/E17-1031</url>
    <abstract>We present the joint task of incremental disfluency detection and utterance
	segmentation and a simple deep learning system which performs it on transcripts
	and ASR results. We show how the constraints of the two tasks interact. Our
	joint-task system outperforms the equivalent individual task systems, provides
	competitive results and is suitable for future use in conversation agents in
	the psychiatric domain.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hough-schlangen:2017:EACLlong</bibkey>
  </paper>

  <paper id="1032">
    <title>From Segmentation to Analyses: a Probabilistic Model for Unsupervised Morphology Induction</title>
    <author><first>Toms</first><last>Bergmanis</last></author>
    <author><first>Sharon</first><last>Goldwater</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>337&#8211;346</pages>
    <url>http://www.aclweb.org/anthology/E17-1032</url>
    <abstract>A major motivation for unsupervised morphological analysis is to reduce the
	sparse data problem in under-resourced languages. Most previous work focus on
	segmenting surface forms into their constituent morphs (taking: tak +ing), but
	surface form segmentation does not solve the sparse data problem as the
	analyses of take and taking are not connected to each other. We present a
	system that adapts the MorphoChains system (Narasimhan et al., 2015) to provide
	morphological analyses that aim to abstract over spelling differences in
	functionally similar morphs. This results in analyses that are not compelled to
	use all the orthographic material of a word (stopping: stop +ing) or limited to
	only that material (acidified: acid +ify +ed). On average across six
	typologically varied languages our system has a similar or better F-score on
	EMMA (a measure of underlying morpheme accuracy) than three strong baselines;
	moreover, the total number of distinct morphemes identified by our system is on
	average 12.8% lower than for Morfessor (Virpioja et al., 2013), a
	state-of-the-art surface segmentation system.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bergmanis-goldwater:2017:EACLlong</bibkey>
  </paper>

  <paper id="1033">
    <title>Creating POS Tagging and Dependency Parsing Experts via Topic Modeling</title>
    <author><first>Atreyee</first><last>Mukherjee</last></author>
    <author><first>Sandra</first><last>K&#252;bler</last></author>
    <author><first>Matthias</first><last>Scheutz</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>347&#8211;355</pages>
    <url>http://www.aclweb.org/anthology/E17-1033</url>
    <abstract>Part of speech (POS) taggers and dependency parsers tend to work well on
	homogeneous datasets but their performance suffers on datasets containing data
	from different genres. In our current work, we investigate how to create POS
	tagging and dependency parsing experts for heterogeneous data by employing
	topic modeling. We create topic models (using Latent Dirichlet Allocation) to
	determine genres from a heterogeneous dataset and then train an expert for each
	of the genres. Our results show that the topic modeling experts reach
	substantial improvements when compared to the general versions. For dependency
	parsing, the improvement reaches 2 percent points over the full training
	baseline when we use two topics.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mukherjee-kubler-scheutz:2017:EACLlong</bibkey>
  </paper>

  <paper id="1034">
    <title>Universal Dependencies and Morphology for Hungarian - and on the Price of Universality</title>
    <author><first>Veronika</first><last>Vincze</last></author>
    <author><first>Katalin</first><last>Simk&#243;</last></author>
    <author><first>Zsolt</first><last>Sz&#225;nt&#243;</last></author>
    <author><first>Rich&#225;rd</first><last>Farkas</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>356&#8211;365</pages>
    <url>http://www.aclweb.org/anthology/E17-1034</url>
    <abstract>In this paper, we present how the principles of universal dependencies and
	morphology have been adapted to Hungarian. We report the most challenging
	grammatical phenomena and our solutions to those. On the basis of the adapted
	guidelines, we have converted and manually corrected 1,800 sentences from the
	Szeged Treebank to universal dependency format. We also introduce experiments
	on this manually annotated corpus for evaluating automatic conversion and the
	added value of language-specific, i.e. non-universal, annotations. Our results
	reveal that converting to universal dependencies is not necessarily trivial,
	moreover, using language-specific morphological features may have an impact on
	overall performance.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>vincze-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1035">
    <title>Addressing the Data Sparsity Issue in Neural AMR Parsing</title>
    <author><first>Xiaochang</first><last>Peng</last></author>
    <author><first>Chuan</first><last>Wang</last></author>
    <author><first>Daniel</first><last>Gildea</last></author>
    <author><first>Nianwen</first><last>Xue</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>366&#8211;375</pages>
    <url>http://www.aclweb.org/anthology/E17-1035</url>
    <abstract>Neural attention models have achieved great success in different NLP tasks.
	How- ever, they have not fulfilled their promise on the AMR parsing task due to
	the data sparsity issue. In this paper, we de- scribe a sequence-to-sequence
	model for AMR parsing and present different ways to tackle the data sparsity
	problem. We show that our methods achieve significant improvement over a
	baseline neural attention model and our results are also competitive against
	state-of-the-art systems that do not use extra linguistic resources.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>peng-EtAl:2017:EACLlong1</bibkey>
  </paper>

  <paper id="1036">
    <title>Generating Natural Language Question-Answer Pairs from a Knowledge Graph Using a RNN Based Question Generation Model</title>
    <author><first>Sathish</first><last>Reddy</last></author>
    <author><first>Dinesh</first><last>Raghu</last></author>
    <author><first>Mitesh M.</first><last>Khapra</last></author>
    <author><first>Sachindra</first><last>Joshi</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>376&#8211;385</pages>
    <url>http://www.aclweb.org/anthology/E17-1036</url>
    <abstract>In recent  years, knowledge  graphs  such  as Freebase that capture facts about
	entities and relationships  between  them  have  been  used actively  for 
	answering  factoid  questions. In this                    paper, we explore the
	problem of
	automatically generating question answer pairs from a given knowledge graph.
	The generated question answer (QA) pairs can be used in several downstream
	applications. For example, they could be used for training better QA systems.
	To generate such QA pairs, we first extract a set of keywords from entities and
	relationships expressed in a triple stored in the knowledge graph. From each
	such set, we use a subset of keywords to generate a natural language question
	that has a unique answer. We treat this subset of keywords as a sequence and
	propose a sequence to sequence model using RNN to generate a natural language
	question from it. Our RNN based model generates QA pairs with an accuracy of
	33.61 percent and performs 110.47  percent (relative) better than  a 
	state-of-the-art template based method for generating natural language question
	from keywords. We also do an extrinsic evaluation by using the generated QA
	pairs to train a QA system and observe that the F1-score of the QA system
	improves by 5.5 percent (relative) when using automatically generated QA pairs
	in addition to manually generated QA pairs available for training.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>reddy-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1037">
    <title>Enumeration of Extractive Oracle Summaries</title>
    <author><first>Tsutomu</first><last>Hirao</last></author>
    <author><first>Masaaki</first><last>Nishino</last></author>
    <author><first>Jun</first><last>Suzuki</last></author>
    <author><first>Masaaki</first><last>Nagata</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>386&#8211;396</pages>
    <url>http://www.aclweb.org/anthology/E17-1037</url>
    <abstract>To analyze the limitations and the future directions of the extractive
	summarization paradigm, this paper proposes an Integer Linear Programming (ILP)
	formulation to obtain extractive oracle summaries in terms of ROUGE-N. We also
	propose an algorithm that enumerates all of the oracle summaries for a set of
	reference summaries to exploit F-measures that evaluate which system summaries
	contain how many sentences that are extracted as an oracle summary. Our
	experimental results obtained from Document Understanding Conference (DUC)
	corpora demonstrated the following: (1) room still exists to improve the
	performance of extractive summarization;  (2) the F-measures derived from the
	enumerated oracle summaries have significantly stronger correlations with human
	judgment than those derived from single oracle summaries.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hirao-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1038">
    <title>Neural Semantic Encoders</title>
    <author><first>Tsendsuren</first><last>Munkhdalai</last></author>
    <author><first>Hong</first><last>Yu</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>397&#8211;407</pages>
    <url>http://www.aclweb.org/anthology/E17-1038</url>
    <abstract>We present a memory augmented neural
	network for natural language understand-
	ing: Neural Semantic Encoders. NSE is
	equipped with a novel memory update rule
	and has a variable sized encoding memory
	that evolves over time and maintains the
	understanding of input sequences through
	read, compose and write operations. NSE
	can also access 1 multiple and shared mem-
	ories. In this paper, we demonstrated the
	effectiveness and the flexibility of NSE
	on five different natural language tasks:
	natural language inference, question an-
	swering, sentence classification, document
	sentiment analysis and machine transla-
	tion where NSE achieved state-of-the-art
	performance when evaluated on publically
	available benchmarks. For example, our
	shared-memory model showed an encour-
	aging result on neural machine translation,
	improving an attention-based baseline by
	approximately 1.0 BLEU.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>munkhdalai-yu:2017:EACLlong2</bibkey>
  </paper>

  <paper id="1039">
    <title>Efficient Benchmarking of NLP APIs using Multi-armed Bandits</title>
    <author><first>Gholamreza</first><last>Haffari</last></author>
    <author><first>Tuan Dung</first><last>Tran</last></author>
    <author><first>Mark</first><last>Carman</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>408&#8211;416</pages>
    <url>http://www.aclweb.org/anthology/E17-1039</url>
    <abstract>Comparing NLP  systems to select the best one for a task of interest, such as
	named entity recognition, is critical for practitioners and researchers. A
	rigorous approach involves setting up a hypothesis testing scenario using the
	performance of the systems on query documents. However, often the hypothesis
	testing approach needs to send a lot of document queries to the systems, which
	can be problematic. In this paper, we present an effective alternative based on
	the multi-armed bandit (MAB). We propose a
	hierarchical generative model to represent the uncertainty in the performance
	measures of the competing systems, to be used by Thompson Sampling to solve the
	resulting  MAB. Experimental results on both synthetic and real data show that
	our approach requires significantly fewer queries compared to the standard
	benchmarking technique to identify the best system according to F-measure.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>haffari-tran-carman:2017:EACLlong</bibkey>
  </paper>

  <paper id="1040">
    <title>Character-Word LSTM Language Models</title>
    <author><first>Lyan</first><last>Verwimp</last></author>
    <author><first>Joris</first><last>Pelemans</last></author>
    <author><first>Hugo</first><last>Van hamme</last></author>
    <author><first>Patrick</first><last>Wambacq</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>417&#8211;427</pages>
    <url>http://www.aclweb.org/anthology/E17-1040</url>
    <abstract>We present a Character-Word Long Short-Term Memory Language Model which both
	reduces the perplexity with respect to a baseline word-level language model and
	reduces the number of parameters of the model. Character information can reveal
	structural (dis)similarities between words and can even be used when a word is
	out-of-vocabulary, thus improving the modeling of infrequent and unknown words.
	By concatenating word and character embeddings, we achieve up to 2.77% relative
	improvement on English compared to a baseline model with a similar amount of
	parameters and 4.57% on Dutch. Moreover, we also outperform baseline word-level
	models with a larger number of parameters.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>verwimp-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1041">
    <title>A Hierarchical Neural Model for Learning Sequences of Dialogue Acts</title>
    <author><first>Quan Hung</first><last>Tran</last></author>
    <author><first>Ingrid</first><last>Zukerman</last></author>
    <author><first>Gholamreza</first><last>Haffari</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>428&#8211;437</pages>
    <url>http://www.aclweb.org/anthology/E17-1041</url>
    <abstract>We propose a novel hierarchical Recurrent Neural Network (RNN) for learning
	sequences of Dialogue Acts (DAs). The input in this task is a sequence of
	utterances (i.e., conversational contributions) comprising a sequence of
	tokens, and the output is a sequence of DA labels (one label per utterance).
	Our model leverages the hierarchical nature of dialogue data by using two
	nested RNNs that capture long-range dependencies at the dialogue level and the
	utterance level. This model is combined with an attention mechanism that
	focuses on salient tokens in utterances. Our experimental results show that our
	model outperforms strong baselines on two popular datasets, Switchboard and
	MapTask; and our detailed empirical analysis highlights the impact of each
	aspect of our model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tran-zukerman-haffari:2017:EACLlong</bibkey>
  </paper>

  <paper id="1042">
    <title>A Network-based End-to-End Trainable Task-oriented Dialogue System</title>
    <author><first>Tsung-Hsien</first><last>Wen</last></author>
    <author><first>David</first><last>Vandyke</last></author>
    <author><first>Nikola</first><last>Mrk&#x161;i&#x107;</last></author>
    <author><first>Milica</first><last>Gasic</last></author>
    <author><first>Lina M.</first><last>Rojas Barahona</last></author>
    <author><first>Pei-Hao</first><last>Su</last></author>
    <author><first>Stefan</first><last>Ultes</last></author>
    <author><first>Steve</first><last>Young</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>438&#8211;449</pages>
    <url>http://www.aclweb.org/anthology/E17-1042</url>
    <abstract>Teaching machines to accomplish tasks by conversing naturally with humans is
	challenging. Currently, developing task-oriented dialogue systems requires
	creating multiple components and typically this involves either a large amount
	of handcrafting, or acquiring costly labelled datasets to solve a statistical
	learning problem for each component. In this work we introduce a neural
	network-based text-in, text-out end-to-end trainable goal-oriented dialogue
	system along with a new way of collecting dialogue data based on a novel
	pipe-lined Wizard-of-Oz framework. This approach allows us to develop dialogue
	systems easily and without making too many assumptions about the task at hand.
	The results show that the model can converse with human subjects naturally
	whilst helping them to accomplish tasks in a restaurant search domain.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wen-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1043">
    <title>May I take your order? A Neural Model for Extracting Structured Information from Conversations</title>
    <author><first>Baolin</first><last>Peng</last></author>
    <author><first>Michael</first><last>Seltzer</last></author>
    <author><first>Y.C.</first><last>Ju</last></author>
    <author><first>Geoffrey</first><last>Zweig</last></author>
    <author><first>Kam-Fai</first><last>Wong</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>450&#8211;459</pages>
    <url>http://www.aclweb.org/anthology/E17-1043</url>
    <abstract>In this paper we tackle a unique and important problem of extracting a
	structured order from the conversation a customer has with an order taker at a
	restaurant. This is motivated by an actual system under development to assist
	in the order taking process. We develop a sequence-to-sequence model that is
	able to map from unstructured conversational input to the structured form that
	is conveyed to the kitchen and appears on the customer receipt. This problem is
	critically different from other tasks like machine translation where
	sequence-to-sequence models have been used: the input includes two sides of a
	conversation; the output is highly structured; and logical manipulations must
	be performed, for example when the customer changes his mind while ordering. We
	present a novel sequence-to-sequence model that incorporates a special
	attention-memory gating mechanism and conversational role markers. The proposed
	model improves performance over both a phrase-based machine translation
	approach and a standard sequence-to-sequence model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>peng-EtAl:2017:EACLlong2</bibkey>
  </paper>

  <paper id="1044">
    <title>A Two-stage Sieve Approach for Quote Attribution</title>
    <author><first>Grace</first><last>Muzny</last></author>
    <author><first>Michael</first><last>Fang</last></author>
    <author><first>Angel</first><last>Chang</last></author>
    <author><first>Dan</first><last>Jurafsky</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>460&#8211;470</pages>
    <url>http://www.aclweb.org/anthology/E17-1044</url>
    <abstract>We present a deterministic sieve-based system for attributing quotations in
	literary text and a new dataset: QuoteLi3. Quote attribution, determining who
	said what in a given text, is important for tasks like creating dialogue
	systems, and in newer areas like computational literary studies, where it
	creates opportunities to analyze novels at scale rather than only a few at a
	time. We release QuoteLi3, which contains more than 6,000 annotations linking
	quotes to speaker mentions and quotes to speaker entities, and introduce a new
	algorithm for quote attribution. Our two-stage algorithm first links quotes to
	mentions, then mentions to entities. Using two stages encapsulates difficult
	sub-problems and improves system performance. The modular design allows us to
	tune for overall performance or higher precision, which is useful for many
	real-world use cases. Our system achieves an average F-score of 87.5 across
	three novels, outperforming previous systems, and can be tuned for precision of
	90.4 at a recall of 65.1.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>muzny-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1045">
    <title>Out-of-domain FrameNet Semantic Role Labeling</title>
    <author><first>Silvana</first><last>Hartmann</last></author>
    <author><first>Ilia</first><last>Kuznetsov</last></author>
    <author><first>Teresa</first><last>Martin</last></author>
    <author><first>Iryna</first><last>Gurevych</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>471&#8211;482</pages>
    <url>http://www.aclweb.org/anthology/E17-1045</url>
    <abstract>Domain dependence of NLP systems is one of the major obstacles to their
	application in large-scale text analysis, also restricting the applicability of
	FrameNet semantic role labeling (SRL) systems. Yet, current FrameNet SRL
	systems are still only evaluated on a single in-domain test set. For the first
	time, we study the domain dependence of FrameNet SRL on a wide range of
	benchmark sets. We create a novel test set for FrameNet SRL based on
	user-generated web text and find that the major bottleneck for out-of-domain
	FrameNet SRL is the frame identification step. To address this problem, we
	develop a simple, yet efficient
	system based on distributed word representations. Our system closely approaches
	the state-of-the-art in-domain while outperforming the best available frame
	identification system out-of-domain. We publish our system and test data for
	research purposes.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hartmann-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1046">
    <title>TDParse: Multi-target-specific sentiment recognition on Twitter</title>
    <author><first>Bo</first><last>Wang</last></author>
    <author><first>Maria</first><last>Liakata</last></author>
    <author><first>Arkaitz</first><last>Zubiaga</last></author>
    <author><first>Rob</first><last>Procter</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>483&#8211;493</pages>
    <url>http://www.aclweb.org/anthology/E17-1046</url>
    <abstract>Existing target-specific sentiment recognition methods consider only a single
	target per tweet, and have been shown to miss nearly half of the actual targets
	mentioned. We present a corpus of UK election tweets, with an average of 3.09
	entities per tweet and more than one type of sentiment in half of the tweets.
	This requires a method for multi-target specific sentiment recognition, which
	we develop by using the context around a target as well as syntactic
	dependencies involving the target. We present results of our method on both a
	benchmark corpus of single targets and the multi-target election corpus,
	showing state-of-the art performance in both corpora and outperforming previous
	approaches to multi-target sentiment task as well as deep learning models for
	single-target sentiment.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wang-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1047">
    <title>Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems</title>
    <author><first>Shyam</first><last>Upadhyay</last></author>
    <author><first>Ming-Wei</first><last>Chang</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>494&#8211;504</pages>
    <url>http://www.aclweb.org/anthology/E17-1047</url>
    <abstract>We propose a new evaluation for automatic solvers for algebra word problems,
	which can identify mistakes that existing evaluations overlook. Our proposal is
	to evaluate such solvers using derivations, which reflect how an equation
	system was constructed from the word problem. To accomplish this, we develop an
	algorithm for checking the equivalence between two derivations, and show how
	derivation annotations can be semi-automatically added to existing datasets. To
	make our experiments more comprehensive, we include the derivation annotation
	for DRAW-1K, a new dataset containing 1000 general algebra word problems. In
	our experiments, we found that the annotated derivations enable a more accurate
	evaluation of automatic solvers than previously used metrics. We release
	derivation annotations for over 2300 algebra word problems for future
	evaluations.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>upadhyay-chang:2017:EACLlong</bibkey>
  </paper>

  <paper id="1048">
    <title>An Extensive Empirical Evaluation of Character-Based Morphological Tagging for 14 Languages</title>
    <author><first>Georg</first><last>Heigold</last></author>
    <author><first>Guenter</first><last>Neumann</last></author>
    <author><first>Josef</first><last>van Genabith</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>505&#8211;513</pages>
    <url>http://www.aclweb.org/anthology/E17-1048</url>
    <abstract>This paper investigates neural character-based morphological tagging 
	for languages with complex morphology and large tag sets.
	Character-based approaches are attractive as they can handle rarely- and unseen
	words gracefully.
	We evaluate on 14 languages and 
	observe consistent gains over a state-of-the-art morphological tagger 
	across all languages except for English and French, where we match the
	state-of-the-art.
	We compare two architectures for computing character-based word vectors using
	recurrent (RNN) and convolutional (CNN) nets. 
	We show that the CNN based approach performs slightly worse and less
	consistently than the RNN based approach.
	Small but systematic gains are observed when combining the two architectures by
	ensembling.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>heigold-neumann-vangenabith:2017:EACLlong</bibkey>
  </paper>

  <paper id="1049">
    <title>Neural Multi-Source Morphological Reinflection</title>
    <author><first>Katharina</first><last>Kann</last></author>
    <author><first>Ryan</first><last>Cotterell</last></author>
    <author><first>Hinrich</first><last>Sch&#252;tze</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>514&#8211;524</pages>
    <url>http://www.aclweb.org/anthology/E17-1049</url>
    <abstract>We explore the task of multi-source morphological reinflection, which
	generalizes the standard, single-source version. The input consists of (i) a
	target tag and (ii) multiple pairs of source form and source tag for a lemma.
	The motivation is that it is beneficial to have access to more than one source
	form since different source forms can provide complementary information, e.g.,
	different stems.  We further present a novel extension to the encoder-decoder
	recurrent neural architecture, consisting of multiple encoders, to better solve
	the task. We show
	that our new architecture outperforms single-source reinflection models and
	publish our dataset for multi-source morphological reinflection to facilitate
	future research.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kann-cotterell-schutze:2017:EACLlong</bibkey>
  </paper>

  <paper id="1050">
    <title>Online Automatic Post-editing for MT in a Multi-Domain Translation Environment</title>
    <author><first>Rajen</first><last>Chatterjee</last></author>
    <author><first>Gebremedhen</first><last>Gebremelak</last></author>
    <author><first>Matteo</first><last>Negri</last></author>
    <author><first>Marco</first><last>Turchi</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>525&#8211;535</pages>
    <url>http://www.aclweb.org/anthology/E17-1050</url>
    <abstract>Automatic post-editing (APE) for machine translation (MT) aims to fix recurrent
	errors made by the MT decoder by learning from correction examples. In
	controlled evaluation scenarios, the representativeness of the training set
	with respect to the test data is a key factor to achieve good performance.
	Real-life scenarios, however, do not guarantee such favorable learning
	conditions. Ideally, to be integrated in a real professional translation
	workflow (e.g. to play a role in computer-assisted translation framework), APE
	tools should be flexible enough to cope with continuous streams of diverse data
	coming from different domains/genres. To cope with this problem, we propose an
	online APE framework that is: i) robust to data diversity (i.e. capable to
	learn and apply correction rules in the right contexts) and ii) able to evolve
	over time (by continuously extending and refining its knowledge). In a
	comparative evaluation, with English-German test data coming in random order
	from two different domains, we show the effectiveness of our approach, which
	outperforms a strong batch system and the state of the art in online APE.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chatterjee-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1051">
    <title>An Incremental Parser for Abstract Meaning Representation</title>
    <author><first>Marco</first><last>Damonte</last></author>
    <author><first>Shay B.</first><last>Cohen</last></author>
    <author><first>Giorgio</first><last>Satta</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>536&#8211;546</pages>
    <url>http://www.aclweb.org/anthology/E17-1051</url>
    <abstract>Abstract Meaning Representation (AMR) is a semantic representation for natural
	language that embeds annotations related to traditional tasks such as named
	entity recognition, semantic role labeling, word sense disambiguation and
	co-reference resolution. We describe a transition-based parser for AMR that
	parses sentences left-to-right, in linear time. We further propose a test-suite
	that assesses specific subtasks that are helpful in comparing AMR parsers, and 
	show that our parser is competitive with the state of the art on the LDC2015E86
	dataset and that it outperforms state-of-the-art parsers for recovering named
	entities and handling polarity.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>damonte-cohen-satta:2017:EACLlong</bibkey>
  </paper>

  <paper id="1052">
    <title>Integrated Learning of Dialog Strategies and Semantic Parsing</title>
    <author><first>Aishwarya</first><last>Padmakumar</last></author>
    <author><first>Jesse</first><last>Thomason</last></author>
    <author><first>Raymond J.</first><last>Mooney</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>547&#8211;557</pages>
    <url>http://www.aclweb.org/anthology/E17-1052</url>
    <abstract>Natural language understanding and dialog management are two integral
	components of interactive dialog systems. Previous research has used machine
	learning techniques to individually optimize these components, with different
	forms of direct and indirect supervision. We present an approach to integrate
	the learning of both a dialog strategy using reinforcement learning, and a
	semantic parser for robust natural language understanding, using only natural
	dialog interaction for supervision. Experimental results on a simulated task of
	robot instruction demonstrate that joint learning of both components improves
	dialog performance over learning either of these components alone.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>padmakumar-thomason-mooney:2017:EACLlong</bibkey>
  </paper>

  <paper id="1053">
    <title>Unsupervised AMR-Dependency Parse Alignment</title>
    <author><first>Wei-Te</first><last>Chen</last></author>
    <author><first>Martha</first><last>Palmer</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>558&#8211;567</pages>
    <url>http://www.aclweb.org/anthology/E17-1053</url>
    <abstract>In this paper, we introduce an Abstract Meaning Representation (AMR) to
	Dependency Parse aligner. Alignment is a preliminary step for AMR parsing, and
	our aligner improves current AMR parser performance. Our aligner involves
	several different features, including named entity tags and semantic role
	labels, and uses Expectation-Maximization training. Results show that our
	aligner reaches an 87.1% F-Score score with the experimental data, and enhances
	AMR parsing.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chen-palmer:2017:EACLlong</bibkey>
  </paper>

  <paper id="1054">
    <title>Improving Chinese Semantic Role Labeling using High-quality Surface and Deep Case Frames</title>
    <author><first>Gongye</first><last>Jin</last></author>
    <author><first>Daisuke</first><last>Kawahara</last></author>
    <author><first>Sadao</first><last>Kurohashi</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>568&#8211;577</pages>
    <url>http://www.aclweb.org/anthology/E17-1054</url>
    <abstract>This paper presents a method for applying automatically acquired knowledge to
	semantic role labeling (SRL). We use a large amount of automatically extracted
	knowledge to improve the performance of SRL.  We present two varieties of
	knowledge, which we call surface case frames and deep case frames. Although the
	surface case frames are compiled from syntactic parses and can be used as rich
	syntactic knowledge, they have limited capability for resolving semantic
	ambiguity. To compensate the deficiency of the surface case frames, we compile
	deep case frames from automatic semantic roles. We also consider quality
	management for both types of knowledge in order to get rid of the noise brought
	from the automatic analyses. The experimental results show that Chinese SRL can
	be improved using automatically acquired knowledge and the quality management
	shows a positive effect on this task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jin-kawahara-kurohashi:2017:EACLlong</bibkey>
  </paper>

  <paper id="1055">
    <title>Multi-level Representations for Fine-Grained Typing of Knowledge Base Entities</title>
    <author><first>Yadollah</first><last>Yaghoobzadeh</last></author>
    <author><first>Hinrich</first><last>Sch&#252;tze</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>578&#8211;589</pages>
    <url>http://www.aclweb.org/anthology/E17-1055</url>
    <abstract>Entities are essential elements of natural language. In this paper, we present
	methods for learning multi-level representations of entities on three
	complementary levels: character (character patterns in entity names extracted,
	e.g., by neural networks), word (embeddings of words in entity names) and
	entity (entity embeddings). We investigate state-of-the-art learning methods on
	each level and find large differences, e.g., for deep learning models,
	traditional ngram features and the subword model of fasttext (Bojanowski et
	al., 2016) on the character level; for word2vec (Mikolov et al., 2013) on the
	word level; and for the order-aware model wang2vec (Ling et al., 2015a) on the
	entity level. 
	We confirm experimentally that each level of representation contributes
	complementary information and a joint representation of all three levels
	improves the existing embedding based baseline for fine-grained entity typing
	by a large margin. Additionally, we show that adding information from entity
	descriptions further improves multi-level representations of entities.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yaghoobzadeh-schutze:2017:EACLlong</bibkey>
  </paper>

  <paper id="1056">
    <title>The ContrastMedium Algorithm: Taxonomy Induction From Noisy Knowledge Graphs With Just A Few Links</title>
    <author><first>Stefano</first><last>Faralli</last></author>
    <author><first>Alexander</first><last>Panchenko</last></author>
    <author><first>Chris</first><last>Biemann</last></author>
    <author><first>Simone Paolo</first><last>Ponzetto</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>590&#8211;600</pages>
    <url>http://www.aclweb.org/anthology/E17-1056</url>
    <abstract>In this paper, we present ContrastMedium, an algorithm that transforms noisy
	semantic networks into full-fledged, clean taxonomies. ContrastMedium is able
	to identify the embedded taxonomy structure from a noisy knowledge graph
	without explicit human supervision such as, for instance, a set of manually
	selected input root and leaf concepts. This is achieved by leveraging
	structural information from a companion reference taxonomy, to which the input
	knowledge graph is linked (either automatically or manually). When used in
	conjunction with methods for hypernym acquisition and knowledge base linking,
	our methodology provides a complete solution for end-to-end taxonomy induction.
	We conduct experiments using automatically acquired knowledge graphs, as well
	as a SemEval benchmark, and show that our method is able to achieve high
	performance on the task of taxonomy induction.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>faralli-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1057">
    <title>Probabilistic Inference for Cold Start Knowledge Base Population with Prior World Knowledge</title>
    <author><first>Bonan</first><last>Min</last></author>
    <author><first>Marjorie</first><last>Freedman</last></author>
    <author><first>Talya</first><last>Meltzer</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>601&#8211;612</pages>
    <url>http://www.aclweb.org/anthology/E17-1057</url>
    <abstract>Building knowledge bases (KB) automatically from text corpora is crucial for
	many applications such as question answering and web search. The problem is
	very challenging and has been divided into sub-problems such as mention and
	named entity recognition, entity linking and relation extraction. However,
	combining these components has shown to be under-constrained and often produces
	KBs with supersize entities and common-sense errors in relations (a person has
	multiple birthdates). The errors are difficult to resolve solely with IE tools
	but become obvious with world knowledge at the corpus level. By analyzing
	Freebase and a large text collection, we found that per-relation cardinality
	and the popularity of entities follow the power-law distribution favoring flat
	long tails with low-frequency instances. We present a probabilistic joint
	inference algorithm to incorporate this world knowledge during KB construction.
	Our approach yields state-of-the-art performance on the TAC Cold Start task,
	and 42% and 19.4% relative improvements in F1 over our baseline on Cold Start
	hop-1 and all-hop queries respectively.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>min-freedman-meltzer:2017:EACLlong</bibkey>
  </paper>

  <paper id="1058">
    <title>Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema</title>
    <author><first>Patrick</first><last>Verga</last></author>
    <author><first>Arvind</first><last>Neelakantan</last></author>
    <author><first>Andrew</first><last>McCallum</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>613&#8211;622</pages>
    <url>http://www.aclweb.org/anthology/E17-1058</url>
    <abstract>Universal schema predicts the types of entities and relations in a knowledge
	base (KB) by jointly embedding the union of all available schema types&#8211;-not
	only types from multiple structured databases (such as Freebase or Wikipedia
	infoboxes), but also types expressed as textual patterns from raw text.  
	This prediction is typically modeled as a matrix completion problem, with one
	type per column, and either one or two entities per row (in the case of entity
	types or binary relation types, respectively).                                       
	Factorizing this sparsely observed matrix yields a learned vector embedding for
	each row and each column.  
	In this paper we explore the problem of making predictions for entities or
	entity-pairs unseen at training time (and hence without a pre-learned row
	embedding).  
	We propose an approach having no per-row parameters at all; rather we produce a
	row vector on the fly using a learned aggregation function of the vectors of
	the observed columns for that row.  
	We experiment with various aggregation functions, including neural network
	attention models.  
	Our approach can be understood as a natural language database, in that
	questions about KB entities are answered by attending to textual or database
	evidence.  
	In experiments predicting both relations and entity types, we demonstrate that
	despite having an order of magnitude fewer parameters than traditional
	universal schema, we can match the accuracy of the traditional model, and more
	importantly, we can now make predictions about unseen rows with nearly the same
	accuracy as rows available at training time.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>verga-neelakantan-mccallum:2017:EACLlong</bibkey>
  </paper>

  <paper id="1059">
    <title>Learning to Generate Product Reviews from Attributes</title>
    <author><first>Li</first><last>Dong</last></author>
    <author><first>Shaohan</first><last>Huang</last></author>
    <author><first>Furu</first><last>Wei</last></author>
    <author><first>Mirella</first><last>Lapata</last></author>
    <author><first>Ming</first><last>Zhou</last></author>
    <author><first>Ke</first><last>Xu</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>623&#8211;632</pages>
    <url>http://www.aclweb.org/anthology/E17-1059</url>
    <abstract>Automatically generating product reviews is a meaningful, yet not well-studied
	task in sentiment analysis. Traditional natural language generation methods
	rely extensively on hand-crafted rules and predefined templates. This paper
	presents an attention-enhanced attribute-to-sequence model to generate product
	reviews for given attribute information, such as user, product, and rating. The
	attribute encoder learns to represent input attributes as vectors. Then, the
	sequence decoder generates reviews by conditioning its output on these vectors.
	We also introduce an attention mechanism to jointly generate reviews and align
	words with input attributes. The proposed model is trained end-to-end to
	maximize the likelihood of target product reviews given the attributes. We
	build a publicly available dataset for the review generation task by leveraging
	the Amazon book reviews and their metadata. Experiments on the dataset show
	that our approach outperforms baseline methods and the attention mechanism
	significantly improves the performance of our model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dong-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1060">
    <title>Learning to generate one-sentence biographies from Wikidata</title>
    <author><first>Andrew</first><last>Chisholm</last></author>
    <author><first>Will</first><last>Radford</last></author>
    <author><first>Ben</first><last>Hachey</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>633&#8211;642</pages>
    <url>http://www.aclweb.org/anthology/E17-1060</url>
    <abstract>We investigate the generation of one-sentence Wikipedia biographies from facts
	derived from Wikidata slot-value pairs.
	We train a recurrent neural network sequence-to-sequence model with attention
	to select facts and generate textual summaries.
	Our model incorporates a novel secondary objective that helps ensure it
	generates sentences that contain the input facts.
	The model achieves a BLEU score of 41, improving significantly upon the vanilla
	sequence-to-sequence model and scoring roughly twice that of a simple template
	baseline.
	Human preference evaluation suggests the model is nearly as good as the
	Wikipedia reference.
	Manual analysis explores content selection, suggesting the model can trade the
	ability to infer knowledge against the risk of hallucinating incorrect
	information.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chisholm-radford-hachey:2017:EACLlong</bibkey>
  </paper>

  <paper id="1061">
    <title>Transition-Based Deep Input Linearization</title>
    <author><first>Ratish</first><last>Puduppully</last></author>
    <author><first>Yue</first><last>Zhang</last></author>
    <author><first>Manish</first><last>Shrivastava</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>643&#8211;654</pages>
    <url>http://www.aclweb.org/anthology/E17-1061</url>
    <abstract>Traditional methods for deep NLG adopt pipeline approaches comprising stages
	such as constructing syntactic input, predicting function words, linearizing
	the syntactic input and generating the surface forms. Though easier to
	visualize, pipeline approaches suffer from error propagation. In addition,
	information available across modules cannot be leveraged by all modules. We
	construct a transition-based model to jointly perform linearization, function
	word prediction and morphological generation, which considerably improves upon
	the accuracy compared to a pipelined baseline system. On a standard deep input
	linearization shared task, our system achieves the best results reported so
	far.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>puduppully-zhang-shrivastava:2017:EACLlong</bibkey>
  </paper>

  <paper id="1062">
    <title>Generating flexible proper name references in text: Data, models and evaluation</title>
    <author><first>Thiago</first><last>Castro Ferreira</last></author>
    <author><first>Emiel</first><last>Krahmer</last></author>
    <author><first>Sander</first><last>Wubben</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>655&#8211;664</pages>
    <url>http://www.aclweb.org/anthology/E17-1062</url>
    <abstract>This study introduces a statistical model able to generate variations of a
	proper name by taking into account the person to be mentioned, the discourse
	context and variation. The model relies on the REGnames corpus, a dataset with
	53,102 proper name references to 1,000 people in different discourse contexts.
	We evaluate the versions of our model from the perspective of how human writers
	produce proper names, and also how human readers process them. The corpus and
	the model are publicly available.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>castroferreira-krahmer-wubben:2017:EACLlong</bibkey>
  </paper>

  <paper id="1063">
    <title>Dependency Parsing as Head Selection</title>
    <author><first>Xingxing</first><last>Zhang</last></author>
    <author><first>Jianpeng</first><last>Cheng</last></author>
    <author><first>Mirella</first><last>Lapata</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>665&#8211;676</pages>
    <url>http://www.aclweb.org/anthology/E17-1063</url>
    <abstract>Conventional graph-based dependency parsers guarantee a tree
	  structure both during training and inference. Instead, we formalize
	  dependency parsing as the problem of independently selecting the
	  head of each word in a sentence. Our model which we call DeNSe (as shorthand for Dependency Neural 
	    Selection) produces a distribution over possible heads for each
	  word using features obtained from a bidirectional recurrent neural
	  network. Without enforcing structural constraints during training,
	  DeNSe generates (at inference time) trees for the
	  overwhelming majority of sentences, while non-tree outputs can be
	  adjusted with a maximum spanning tree algorithm.  We evaluate
	  DeNSe on four languages (English, Chinese, Czech, and
	  German) with varying degrees of non-projectivity. Despite the
	  simplicity of the approach, our parsers are on par with the state of
	  the art.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhang-cheng-lapata:2017:EACLlong</bibkey>
  </paper>

  <paper id="1064">
    <title>Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing</title>
    <author><first>Minh</first><last>Le</last></author>
    <author><first>Antske</first><last>Fokkens</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>677&#8211;687</pages>
    <url>http://www.aclweb.org/anthology/E17-1064</url>
    <abstract>Error propagation is a common problem in NLP. Reinforcement learning explores
	erroneous states during training and can therefore be more robust when mistakes
	are made early in a process. In this paper, we apply reinforcement learning to
	greedy dependency parsing which is known to suffer from error propagation.
	Reinforcement learning improves accuracy of both labeled and unlabeled
	dependencies of the Stanford Neural Dependency Parser, a high performance
	greedy parser, while maintaining its efficiency. We investigate the portion of
	errors which are the result of error propagation and confirm that reinforcement
	learning reduces the occurrence of error propagation.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>le-fokkens:2017:EACLlong</bibkey>
  </paper>

  <paper id="1065">
    <title>Noisy-context surprisal as a human sentence processing cost model</title>
    <author><first>Richard</first><last>Futrell</last></author>
    <author><first>Roger</first><last>Levy</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>688&#8211;698</pages>
    <url>http://www.aclweb.org/anthology/E17-1065</url>
    <abstract>We use the noisy-channel theory of human sentence comprehension to develop an
	incremental processing cost model that unifies and extends key features of
	expectation-based and memory-based models. In this model, which we call
	noisy-context surprisal, the processing cost of a word is the surprisal of the
	word given a noisy representation of the preceding context. We show that this
	model accounts for an outstanding puzzle in sentence comprehension,
	language-dependent structural forgetting effects (Gibson and Thomas, 1999;
	Vasishth et al., 2010; Frank et al., 2016), which are previously not well
	modeled by either expectation-based or memory-based approaches. Additionally,
	we show that this model derives and generalizes locality effects (Gibson, 1998;
	Demberg and Keller, 2008), a signature prediction of memory-based models. We
	give corpus-based evidence for a key assumption in this derivation.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>futrell-levy:2017:EACLlong</bibkey>
  </paper>

  <paper id="1066">
    <title>Task-Specific Attentive Pooling of Phrase Alignments Contributes to Sentence Matching</title>
    <author><first>Wenpeng</first><last>Yin</last></author>
    <author><first>Hinrich</first><last>Sch&#252;tze</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>699&#8211;709</pages>
    <url>http://www.aclweb.org/anthology/E17-1066</url>
    <abstract>This work studies comparatively  two typical sentence matching tasks: textual
	entailment (TE) and answer selection (AS), observing that  weaker phrase
	alignments are more critical in TE, while stronger phrase alignments deserve
	more attention in AS. The key to reach this observation lies in phrase
	detection, phrase representation, phrase alignment, and more importantly how to
	 connect those aligned phrases of different matching degrees with the final
	classifier. 
	Prior work  (i) has limitations in phrase generation and representation, or
	(ii)
	conducts alignment at word and phrase levels by handcrafted features or (iii)
	utilizes a single framework of alignment without considering the
	characteristics of specific tasks, which limits the framework's effectiveness
	across tasks. 
	We propose an architecture based on Gated Recurrent Unit that supports (i)
	representation learning of phrases of arbitrary granularity and (ii)
	task-specific attentive pooling of phrase alignments between two sentences. 
	Experimental results on TE and AS match our observation and show the
	effectiveness of our approach.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yin-schutze:2017:EACLlong</bibkey>
  </paper>

  <paper id="1067">
    <title>On-demand Injection of Lexical Knowledge for Recognising Textual Entailment</title>
    <author><first>Pascual</first><last>Mart&#237;nez-G&#243;mez</last></author>
    <author><first>Koji</first><last>Mineshima</last></author>
    <author><first>Yusuke</first><last>Miyao</last></author>
    <author><first>Daisuke</first><last>Bekki</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>710&#8211;720</pages>
    <url>http://www.aclweb.org/anthology/E17-1067</url>
    <abstract>We approach the recognition of textual entailment using logical semantic
	representations and a theorem prover.  In this setup, lexical divergences that
	preserve semantic entailment between the source and target texts need to be
	explicitly stated.  However, recognising subsentential semantic relations is
	not trivial.  We address this problem by monitoring the proof of the theorem
	and detecting unprovable sub-goals that share predicate arguments with logical
	premises. If a linguistic relation exists, then an appropriate axiom is
	constructed on-demand and the theorem proving continues.  Experiments show that
	this approach is effective and precise, producing a system that outperforms
	other logic-based systems and is competitive with state-of-the-art statistical
	methods.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>martinezgomez-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1068">
    <title>Learning to Predict Denotational Probabilities For Modeling Entailment</title>
    <author><first>Alice</first><last>Lai</last></author>
    <author><first>Julia</first><last>Hockenmaier</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>721&#8211;730</pages>
    <url>http://www.aclweb.org/anthology/E17-1068</url>
    <abstract>We propose a framework that captures the denotational probabilities of words
	and phrases by embedding them in a vector space, and present a method to induce
	such an embedding from a dataset of denotational probabilities. We show that
	our model successfully predicts denotational probabilities for unseen phrases,
	and that its predictions are useful for textual entailment datasets such as
	SICK and SNLI.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lai-hockenmaier:2017:EACLlong</bibkey>
  </paper>

  <paper id="1069">
    <title>A Societal Sentiment Analysis: Predicting the Values and Ethics of Individuals by Analysing Social Media Content</title>
    <author><first>Tushar</first><last>Maheshwari</last></author>
    <author><first>Aishwarya N.</first><last>Reganti</last></author>
    <author><first>Samiksha</first><last>Gupta</last></author>
    <author><first>Anupam</first><last>Jamatia</last></author>
    <author><first>Upendra</first><last>Kumar</last></author>
    <author><first>Bj&#246;rn</first><last>Gamb&#228;ck</last></author>
    <author><first>Amitava</first><last>Das</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>731&#8211;741</pages>
    <url>http://www.aclweb.org/anthology/E17-1069</url>
    <abstract>To find out how users' social media behaviour and language are related to their
	ethical practices, the paper investigates applying Schwartz' psycholinguistic
	model of societal sentiment to social media text. The analysis is based on
	corpora collected from user essays as well as social media (Facebook and
	Twitter). Several experiments were carried out on the corpora to classify the
	ethical values of users, incorporating Linguistic Inquiry Word Count analysis,
	n-grams, topic models, psycholinguistic lexica, speech-acts, and non-linguistic
	information, while applying a range of machine learners (Support Vector
	Machines, Logistic Regression, and Random Forests) to identify the best
	linguistic and non-linguistic features for automatic classification of values
	and ethics.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>maheshwari-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1070">
    <title>Argument Strength is in the Eye of the Beholder: Audience Effects in Persuasion</title>
    <author><first>Stephanie</first><last>Lukin</last></author>
    <author><first>Pranav</first><last>Anand</last></author>
    <author><first>Marilyn</first><last>Walker</last></author>
    <author><first>Steve</first><last>Whittaker</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>742&#8211;753</pages>
    <url>http://www.aclweb.org/anthology/E17-1070</url>
    <abstract>Americans spend about a third of their time online, with many participating in
	online
	conversations on social and political issues. We hypothesize that social media
	arguments on such issues may be more engaging and persuasive than traditional
	media
	summaries, and that particular types of people may be more or less convinced by
	particular styles of argument, e.g. emotional arguments may resonate with some
	personalities while factual arguments resonate with others. We report a set of
	experiments
	testing at large scale how audience variables interact with argument style to
	affect the persuasiveness of an argument, an under-researched topic within
	natural
	language processing. We show that belief change is affected by personality
	factors,
	with conscientious, open and agreeable people being more convinced by emotional
	arguments.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lukin-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1071">
    <title>A Language-independent and Compositional Model for Personality Trait Recognition from Short Texts</title>
    <author><first>Fei</first><last>Liu</last></author>
    <author><first>Julien</first><last>Perez</last></author>
    <author><first>Scott</first><last>Nowson</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>754&#8211;764</pages>
    <url>http://www.aclweb.org/anthology/E17-1071</url>
    <abstract>There have been many attempts at automatically recognising author personality
	traits from text, typically incorporating linguistic features with conventional
	machine learning models, e.g. linear regression or Support Vector Machines. In
	this work, we propose to use deep-learning-based models with atomic features of
	text &#8211; the characters &#8211; to build hierarchical, vectorial word and sentence
	representations for the task of trait inference. On a corpus of tweets, this
	method shows state-of-the-art performance across five traits and three
	languages (English, Spanish and Italian) compared with prior work in author
	profiling. The results, supported by preliminary visualisation work, are
	encouraging for the ability to detect complex human traits.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>liu-perez-nowson:2017:EACLlong</bibkey>
  </paper>

  <paper id="1072">
    <title>A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments</title>
    <author><first>Omer</first><last>Levy</last></author>
    <author><first>Anders</first><last>S&#248;gaard</last></author>
    <author><first>Yoav</first><last>Goldberg</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>765&#8211;774</pages>
    <url>http://www.aclweb.org/anthology/E17-1072</url>
    <abstract>While cross-lingual word embeddings have been studied extensively in recent
	years, the qualitative differences between the different algorithms remain
	vague. We observe that whether or not an algorithm uses a particular feature
	set (sentence IDs) accounts for a significant performance gap among these
	algorithms. This feature set is also used by traditional alignment algorithms,
	such as IBM Model-1, which demonstrate similar performance to state-of-the-art
	embedding algorithms on a variety of benchmarks. Overall, we observe that
	different algorithmic approaches for utilizing the sentence ID feature space
	result in similar performance. This paper draws both empirical and theoretical
	parallels between the embedding and alignment literature, and suggests that
	adding additional sources of information, which go beyond the traditional
	signal of bilingual sentence-aligned corpora, may substantially improve
	cross-lingual word embeddings, and that future baselines should at least take
	such features into account.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>levy-sogaard-goldberg:2017:EACLlong</bibkey>
  </paper>

  <paper id="1073">
    <title>Online Learning of Task-specific Word Representations with a Joint Biconvex Passive-Aggressive Algorithm</title>
    <author><first>Pascal</first><last>Denis</last></author>
    <author><first>Liva</first><last>Ralaivola</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>775&#8211;784</pages>
    <url>http://www.aclweb.org/anthology/E17-1073</url>
    <abstract>This paper presents a new, efficient method for learning task-specific
	word vectors using a variant of the Passive-Aggressive
	algorithm. Specifically, this algorithm learns a word embedding matrix
	in tandem with the classifier parameters in an online fashion, solving
	a bi-convex constrained optimization at each iteration. We provide a
	theoretical analysis of this new algorithm in terms of regret bounds,
	and evaluate it on both synthetic data and NLP classification
	problems, including text classification and sentiment analysis. In the
	latter case, we compare various pre-trained word vectors to initialize
	our word embedding matrix, and show that the matrix learned by our
	algorithm vastly outperforms the initial matrix, with performance
	results comparable or above the state-of-the-art on these tasks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>denis-ralaivola:2017:EACLlong</bibkey>
  </paper>

  <paper id="1074">
    <title>Nonsymbolic Text Representation</title>
    <author><first>Hinrich</first><last>Sch&#252;tze</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>785&#8211;796</pages>
    <url>http://www.aclweb.org/anthology/E17-1074</url>
    <abstract>We introduce the first generic text representation model that is
	completely nonsymbolic, i.e., it does not require the
	availability of a segmentation or tokenization method that
	attempts to identify words or other symbolic units in text.
	This applies to training the parameters of the model
	on a training corpus
	as well
	as to applying it when computing the representation of a new text.
	We show that our model performs better than prior work
	on an information extraction and a text denoising task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>schutze:2017:EACLlong</bibkey>
  </paper>

  <paper id="1075">
    <title>Fine-Grained Entity Type Classification by Jointly Learning Representations and Label Embeddings</title>
    <author><first>Abhishek</first><last>Abhishek</last></author>
    <author><first>Ashish</first><last>Anand</last></author>
    <author><first>Amit</first><last>Awekar</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>797&#8211;807</pages>
    <url>http://www.aclweb.org/anthology/E17-1075</url>
    <abstract>Fine-grained entity type classification (FETC) is the task of classifying an
	entity mention to a broad set of types. Distant supervision paradigm is
	extensively used to generate training data for this task. However, generated
	training data assigns same set of labels to every mention of an entity without
	considering its local context. Existing FETC systems have two major drawbacks:
	assuming training data to be noise free and use of hand crafted features. Our
	work overcomes both drawbacks. We propose a neural network model that jointly
	learns entity mentions and their context representation to eliminate use of
	hand crafted features. Our model treats training data as noisy and uses
	non-parametric variant of hinge loss function. Experiments show that the
	proposed model outperforms previous state-of-the-art methods on two publicly
	available datasets, namely FIGER (GOLD) and BBN with an average relative
	improvement of 2.69% in micro-F1 score. Knowledge learnt by our model on one
	dataset can be transferred to other datasets while using same model or other
	FETC systems. These approaches of transferring knowledge further improve the
	performance of respective models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>abhishek-anand-awekar:2017:EACLlong</bibkey>
  </paper>

  <paper id="1076">
    <title>Event extraction from Twitter using Non-Parametric Bayesian Mixture Model with Word Embeddings</title>
    <author><first>Deyu</first><last>Zhou</last></author>
    <author><first>Xuan</first><last>Zhang</last></author>
    <author><first>Yulan</first><last>He</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>808&#8211;817</pages>
    <url>http://www.aclweb.org/anthology/E17-1076</url>
    <abstract>To extract structured representations of newsworthy events from Twitter,
	unsupervised models typically assume that tweets involving the same named
	entities and expressed using similar words are likely to belong to the same
	event. Hence, they group tweets into clusters based on the co-occurrence
	patterns of named entities and topical keywords. However, there are two main
	limitations. First, they require the number of events to be known beforehand,
	which is not realistic in practical applications. Second, they don't recognise
	that the same named entity might be referred to by multiple mentions and tweets
	using different mentions would be wrongly assigned to different events. To
	overcome these limitations, we propose a non-parametric Bayesian mixture model
	with word embeddings for event extraction, in which the number of events can be
	inferred automatically and the issue of lexical variations for the same named
	entity can be dealt with properly. Our model has been evaluated on three
	datasets with sizes ranging between 2,499 and over 60 million tweets.
	Experimental results show that our model outperforms the baseline approach on
	all datasets by 5-8% in F-measure.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhou-zhang-he:2017:EACLlong</bibkey>
  </paper>

  <paper id="1077">
    <title>End-to-end Relation Extraction using Neural Networks and Markov Logic Networks</title>
    <author><first>Sachin</first><last>Pawar</last></author>
    <author><first>Pushpak</first><last>Bhattacharyya</last></author>
    <author><first>Girish</first><last>Palshikar</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>818&#8211;827</pages>
    <url>http://www.aclweb.org/anthology/E17-1077</url>
    <abstract>End-to-end relation extraction refers to identifying boundaries of entity
	mentions, entity types of these mentions and appropriate semantic relation for
	each pair of mentions. Traditionally, separate predictive models were trained
	for each of these tasks and were used in a &#x201c;pipeline&#x201d; fashion where output
	of one model is fed as input to another. But it was observed that addressing
	some of these tasks jointly results in better performance. We propose a single,
	joint neural network based model to carry out all the three tasks of boundary
	identification, entity type classification and relation type classification.
	This model is referred to as &#x201c;All Word Pairs&#x201d; model (AWP-NN) as it assigns
	an appropriate label to each word pair in a given sentence for performing
	end-to-end relation extraction. We also propose to refine output of the AWP-NN
	model by using inference in Markov Logic Networks (MLN) so that additional
	domain knowledge can be effectively incorporated. We demonstrate effectiveness
	of our approach by achieving better end-to-end relation extraction performance
	than all 4 previous joint modelling approaches, on the standard dataset of ACE
	2004.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>pawar-bhattacharyya-palshikar:2017:EACLlong</bibkey>
  </paper>

  <paper id="1078">
    <title>Trust, but Verify! Better Entity Linking through Automatic Verification</title>
    <author><first>Benjamin</first><last>Heinzerling</last></author>
    <author><first>Michael</first><last>Strube</last></author>
    <author><first>Chin-Yew</first><last>Lin</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>828&#8211;838</pages>
    <url>http://www.aclweb.org/anthology/E17-1078</url>
    <abstract>We introduce automatic verification as a post-processing step for entity
	linking (EL).
	The proposed method trusts EL system results collectively, by assuming entity
	mentions are mostly linked correctly, in order to create a semantic profile of
	the given text using geospatial and temporal information, as well as
	fine-grained entity types.
	This profile is then used to automatically verify each linked mention
	individually, i.e., to predict whether it has been linked correctly or not.
	Verification allows leveraging a rich set of global and pairwise features that
	would be prohibitively expensive for EL systems employing global inference.
	Evaluation shows consistent improvements across datasets and systems. In
	particular, when applied to state-of-the-art systems, our method yields an
	absolute improvement in linking performance of up to 1.7 F1 on AIDA/CoNLL'03
	and up to 2.4 F1 on the English TAC KBP 2015 TEDL dataset.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>heinzerling-strube-lin:2017:EACLlong</bibkey>
  </paper>

  <paper id="1079">
    <title>Named Entity Recognition in the Medical Domain with Constrained CRF Models</title>
    <author><first>Charles</first><last>Jochim</last></author>
    <author><first>Lea</first><last>Deleris</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>839&#8211;849</pages>
    <url>http://www.aclweb.org/anthology/E17-1079</url>
    <abstract>This paper investigates how to improve performance on information
	  extraction tasks by constraining and sequencing CRF-based
	  approaches.  We consider two different relation extraction tasks,
	  both from the medical literature: dependence relations and
	  probability statements.  We explore whether adding constraints can
	  lead to an improvement over standard CRF decoding.  Results on our
	  relation extraction tasks are promising, showing significant
	  increases in performance from both (i) adding constraints to
	  post-process the output of a baseline CRF, which captures &#x201c;domain
	  knowledge&#x201d;, and (ii) further allowing flexibility in the
	  application of those constraints by leveraging a binary classifier
	  as a pre-processing step.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jochim-deleris:2017:EACLlong</bibkey>
  </paper>

  <paper id="1080">
    <title>Learning and Knowledge Transfer with Memory Networks for Machine Comprehension</title>
    <author><first>Mohit</first><last>Yadav</last></author>
    <author><first>Lovekesh</first><last>Vig</last></author>
    <author><first>Gautam</first><last>Shroff</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>850&#8211;859</pages>
    <url>http://www.aclweb.org/anthology/E17-1080</url>
    <abstract>Enabling machines to read and comprehend unstructured text remains an
	unfulfilled goal for NLP research. Recent research efforts on the &#x201c;machine
	comprehension&#x201d; task have managed to achieve close to ideal performance on
	simulated data. However, achieving similar levels of performance on small real
	world datasets has proved difficult; major challenges stem from the large
	vocabulary size, complex grammar, and, the frequent ambiguities in linguistic
	structure. On the other hand, the requirement of human generated annotations
	for training, in order to ensure a sufficiently diverse set of questions is
	prohibitively expensive. Motivated by these practical issues, we propose a
	novel curriculum inspired training procedure for Memory Networks to improve the
	performance for machine comprehension with relatively small volumes of training
	data. Additionally, we explore various training regimes for Memory Networks to
	allow knowledge transfer from a closely related domain having larger volumes of
	labelled data. We also suggest the use of a loss function to incorporate the
	asymmetric nature of knowledge transfer. Our experiments demonstrate
	improvements on Dailymail, CNN, and MCTest datasets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yadav-vig-shroff:2017:EACLlong</bibkey>
  </paper>

  <paper id="1081">
    <title>If No Media Were Allowed inside the Venue, Was Anybody Allowed?</title>
    <author><first>Zahra</first><last>Sarabi</last></author>
    <author><first>Eduardo</first><last>Blanco</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>860&#8211;869</pages>
    <url>http://www.aclweb.org/anthology/E17-1081</url>
    <abstract>This paper presents a framework to understand negation in positive terms.
	Specifically, we extract positive meaning from negation when the negation cue
	syntactically modifies a noun or adjective. Our approach is grounded on
	generating potential positive interpretations automatically, and then scoring
	them. Experimental results show that interpretations scored high can be
	reliably identified.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sarabi-blanco:2017:EACLlong</bibkey>
  </paper>

  <paper id="1082">
    <title>Metaheuristic Approaches to Lexical Substitution and Simplification</title>
    <author><first>Sallam</first><last>Abualhaija</last></author>
    <author><first>Tristan</first><last>Miller</last></author>
    <author><first>Judith</first><last>Eckle-Kohler</last></author>
    <author><first>Iryna</first><last>Gurevych</last></author>
    <author><first>Karl-Heinz</first><last>Zimmermann</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>870&#8211;880</pages>
    <url>http://www.aclweb.org/anthology/E17-1082</url>
    <abstract>In this paper, we propose using metaheuristics&#8211;-in particular, simulated
	annealing and the new D-Bees algorithm&#8211;-to solve word sense disambiguation as
	an optimization problem within a knowledge-based lexical substitution system. 
	We are the first to perform such an extrinsic evaluation of metaheuristics, for
	which we use two standard lexical substitution datasets, one English and one
	German.  We find that D-Bees has robust performance for both languages, and
	performs better than simulated annealing, though both achieve good results. 
	Moreover, the D-Bees&#8211;based lexical substitution system outperforms
	state-of-the-art systems on several evaluation metrics.  We also show that
	D-Bees achieves competitive performance in lexical simplification, a variant of
	lexical substitution.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>abualhaija-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1083">
    <title>Paraphrasing Revisited with Neural Machine Translation</title>
    <author><first>Jonathan</first><last>Mallinson</last></author>
    <author><first>Rico</first><last>Sennrich</last></author>
    <author><first>Mirella</first><last>Lapata</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>881&#8211;893</pages>
    <url>http://www.aclweb.org/anthology/E17-1083</url>
    <abstract>Recognizing and generating paraphrases is an important component in many
	natural language processing applications.  A well-established technique for
	automatically extracting paraphrases leverages bilingual corpora to find
	meaning-equivalent phrases in a single language by &#x201c;pivoting&#x201d; over a shared
	translation in another language. In this paper we revisit bilingual pivoting in
	the context of neural machine translation and present a paraphrasing model
	based purely on neural networks. Our model represents paraphrases in a
	continuous space, estimates the degree of semantic relatedness between text
	segments of arbitrary length, and generates candidate paraphrases for any
	source input. Experimental results across tasks and datasets show that neural
	paraphrases outperform those obtained with conventional phrase-based pivoting
	approaches.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mallinson-sennrich-lapata:2017:EACLlong</bibkey>
  </paper>

  <paper id="1084">
    <title>Multilingual Training of Crosslingual Word Embeddings</title>
    <author><first>Long</first><last>Duong</last></author>
    <author><first>Hiroshi</first><last>Kanayama</last></author>
    <author><first>Tengfei</first><last>Ma</last></author>
    <author><first>Steven</first><last>Bird</last></author>
    <author><first>Trevor</first><last>Cohn</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>894&#8211;904</pages>
    <url>http://www.aclweb.org/anthology/E17-1084</url>
    <abstract>Crosslingual word embeddings represent lexical items from different languages 
	using the same vector space, enabling crosslingual transfer. Most prior 
	work constructs embeddings for a pair of languages, with English on one side.
	We investigate methods for building high quality crosslingual word embeddings
	for many languages in a unified vector space.In this way, we can exploit and
	combine strength of many languages.
	We obtained high performance on bilingual lexicon induction, monolingual
	similarity and crosslingual document classification tasks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>duong-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1085">
    <title>Building Lexical Vector Representations from Concept Definitions</title>
    <author><first>Danilo</first><last>Silva de Carvalho</last></author>
    <author><first>Minh Le</first><last>Nguyen</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>905&#8211;915</pages>
    <url>http://www.aclweb.org/anthology/E17-1085</url>
    <abstract>The use of distributional language representations have opened new paths in
	solving a variety of NLP problems. However, alternative approaches can take
	advantage of information unavailable through pure statistical means. This paper
	presents a method for building vector representations from meaning unit blocks
	called concept definitions, which are obtained by extracting information from a
	curated linguistic resource (Wiktionary). The representations obtained in this
	way can be compared through conventional cosine similarity and are also
	interpretable by humans. Evaluation was conducted in semantic similarity and
	relatedness test sets, with results indicating a performance comparable to
	other methods based on single linguistic resource extraction. The results also
	indicate noticeable performance gains when combining distributional similarity
	scores with the ones obtained using this approach. Additionally, a discussion
	on the proposed method's shortcomings is provided in the analysis of error
	cases.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>silvadecarvalho-nguyen:2017:EACLlong</bibkey>
  </paper>

  <paper id="1086">
    <title>ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing</title>
    <author><first>Andrei</first><last>Butnaru</last></author>
    <author><first>Radu Tudor</first><last>Ionescu</last></author>
    <author><first>Florentina</first><last>Hristea</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>916&#8211;926</pages>
    <url>http://www.aclweb.org/anthology/E17-1086</url>
    <abstract>In this paper, we present a novel unsupervised algorithm for word sense
	disambiguation (WSD) at the document level. Our algorithm is inspired by a
	widely-used approach in the field of genetics for whole genome sequencing,
	known as the Shotgun sequencing technique. The proposed WSD algorithm is based
	on three main steps. First, a brute-force WSD algorithm is applied to short
	context windows (up to 10 words) selected from the document in order to
	generate a short list of likely sense configurations for each window. In the
	second step, these local sense configurations are assembled into longer
	composite configurations based on suffix and prefix matching. The resulted
	configurations are ranked by their length, and the sense of each word is chosen
	based on a voting scheme that considers only the top k configurations in which
	the word appears. We compare our algorithm with other state-of-the-art
	unsupervised WSD algorithms and demonstrate better performance, sometimes by a
	very large margin. We also show that our algorithm can yield better performance
	than the Most Common Sense (MCS) baseline on one data set. Moreover, our
	algorithm has a very small number of parameters, is robust to parameter tuning,
	and, unlike other bio-inspired methods, it gives a deterministic solution (it
	does not involve random choices).</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>butnaru-ionescu-hristea:2017:EACLlong</bibkey>
  </paper>

  <paper id="1087">
    <title>LanideNN: Multilingual Language Identification on Text Stream</title>
    <author><first>Tom</first><last>Kocmi</last></author>
    <author><first>Ond&#x159;ej</first><last>Bojar</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>927&#8211;936</pages>
    <url>http://www.aclweb.org/anthology/E17-1087</url>
    <abstract>In language identification, a common first
	step in natural language processing, we
	want to automatically determine the language
	of some input text. Monolingual
	language identification assumes that the
	given document is written in one language.
	In multilingual language identification, the
	document is usually in two or three languages
	and we just want their names. We
	aim one step further and propose a method
	for textual language identification where
	languages can change arbitrarily and the
	goal is to identify the spans of each of the
	languages.
	Our method is based on Bidirectional Recurrent
	Neural Networks and it performs
	well in monolingual and multilingual language
	identification tasks on six datasets
	covering 131 languages. The method
	keeps the accuracy also for short documents
	and across domains, so it is ideal
	for off-the-shelf use without preparation of
	training data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kocmi-bojar:2017:EACLlong</bibkey>
  </paper>

  <paper id="1088">
    <title>Cross-Lingual Word Embeddings for Low-Resource Language Modeling</title>
    <author><first>Oliver</first><last>Adams</last></author>
    <author><first>Adam</first><last>Makarucha</last></author>
    <author><first>Graham</first><last>Neubig</last></author>
    <author><first>Steven</first><last>Bird</last></author>
    <author><first>Trevor</first><last>Cohn</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>937&#8211;947</pages>
    <url>http://www.aclweb.org/anthology/E17-1088</url>
    <abstract>Most languages have no established writing system and minimal written records.
	However, textual data is essential for natural language processing, and
	particularly important for training language models to support speech
	recognition. Even in cases where text data is missing, there are some languages
	for which bilingual lexicons are available, since creating lexicons is a
	fundamental task of documentary linguistics.  We investigate the use of such
	lexicons to improve language models when textual training data is limited to as
	few as a thousand sentences. The method involves learning cross-lingual word
	embeddings as a preliminary step in training monolingual language models.
	Results across a number of languages show that language models are improved by
	this pre-training. Application to Yongning Na, a threatened language,
	highlights challenges in deploying the approach in real low-resource
	environments.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>adams-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1089">
    <title>Consistent Translation of Repeated Nouns using Syntactic and Semantic Cues</title>
    <author><first>Xiao</first><last>Pu</last></author>
    <author><first>Laura</first><last>Mascarell</last></author>
    <author><first>Andrei</first><last>Popescu-Belis</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>948&#8211;957</pages>
    <url>http://www.aclweb.org/anthology/E17-1089</url>
    <abstract>We propose a method to decide whether two occurrences of the same noun in a
	source text should be translated consistently, i.e. using the same noun in the
	target text as well.  We train and test classifiers that predict consistent
	translations based on lexical, syntactic, and semantic features.  We first
	evaluate the accuracy of our classifiers intrinsically, in terms of the
	accuracy of consistency predictions, over a subset of the UN Corpus.  Then, we
	also evaluate them in combination with phrase-based statistical MT systems for
	Chinese-to-English and German-to-English.  We compare the automatic
	post-editing of noun translations with the re-ranking of the translation
	hypotheses based on the classifiers' output, and also use these methods in
	combination.  This improves over the baseline and closes up to 50% of the gap
	in BLEU scores between the baseline and an oracle classifier.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>pu-mascarell-popescubelis:2017:EACLlong</bibkey>
  </paper>

  <paper id="1090">
    <title>Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking</title>
    <author><first>David M.</first><last>Howcroft</last></author>
    <author><first>Vera</first><last>Demberg</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>958&#8211;968</pages>
    <url>http://www.aclweb.org/anthology/E17-1090</url>
    <abstract>While previous research on readability has typically focused on document-level
	measures, recent work in areas such as natural language generation has pointed
	out the need of sentence-level readability measures.  Much of psycholinguistics
	has focused for many years on processing measures that provide difficulty
	estimates on a word-by-word basis. However, these psycholinguistic measures
	have not yet been tested on sentence readability ranking tasks.  In this paper,
	we use four psycholinguistic measures: idea density, surprisal, integration
	cost, and embedding depth to test whether these features are predictive of
	readability levels. We find that psycholinguistic features significantly
	improve performance by up to 3 percentage points over a standard document-level
	readability metric baseline.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>howcroft-demberg:2017:EACLlong</bibkey>
  </paper>

  <paper id="1091">
    <title>Web-Scale Language-Independent Cataloging of Noisy Product Listings for E-Commerce</title>
    <author><first>Pradipto</first><last>Das</last></author>
    <author><first>Yandi</first><last>Xia</last></author>
    <author><first>Aaron</first><last>Levine</last></author>
    <author><first>Giuseppe</first><last>Di Fabbrizio</last></author>
    <author><first>Ankur</first><last>Datta</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>969&#8211;979</pages>
    <url>http://www.aclweb.org/anthology/E17-1091</url>
    <abstract>The cataloging of product listings through taxonomy categorization is a
	fundamental problem for any e-commerce marketplace, with applications ranging
	from personalized search recommendations to query understanding.
	However, manual and rule based approaches to categorization are not scalable. 
	In this paper, we compare several classifiers for categorizing listings in both
	English and Japanese product catalogs. 
	We show empirically that a combination of words from product titles,
	navigational breadcrumbs, and list prices, when available, improves results
	significantly.
	We outline a novel method using correspondence topic models and a lightweight
	manual process to reduce noise from mis-labeled data in the training set.
	We contrast linear models, gradient boosted trees (GBTs) and convolutional
	neural networks (CNNs), and show that GBTs and CNNs yield the highest gains in
	error reduction.
	Finally, we show GBTs applied in a language-agnostic way on a large-scale
	Japanese e-commerce dataset have improved taxonomy categorization performance
	over current state-of-the-art based on deep belief network models.
	Author{3}Affiliation</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>das-EtAl:2017:EACLlong2</bibkey>
  </paper>

  <paper id="1092">
    <title>Recognizing Insufficiently Supported Arguments in Argumentative Essays</title>
    <author><first>Christian</first><last>Stab</last></author>
    <author><first>Iryna</first><last>Gurevych</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>980&#8211;990</pages>
    <url>http://www.aclweb.org/anthology/E17-1092</url>
    <abstract>In this paper, we propose a new task for assessing the quality of natural
	language arguments. The premises of a well-reasoned argument should provide
	enough evidence for accepting or rejecting its claim. Although this criterion,
	known as sufficiency, is widely adopted in argumentation theory, there are no
	empirical studies on its applicability to real arguments. In this work, we show
	that human annotators substantially agree on the sufficiency criterion and
	introduce a novel annotated corpus. Furthermore, we experiment with
	feature-rich SVMs and Convolutional Neural Networks and achieve 84% accuracy
	for automatically identifying insufficiently supported arguments. The final
	corpus as well as the annotation guideline are freely available for encouraging
	future research on argument quality.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>stab-gurevych:2017:EACLlong</bibkey>
  </paper>

  <paper id="1093">
    <title>Distributed Document and Phrase Co-embeddings for Descriptive Clustering</title>
    <author><first>Motoki</first><last>Sato</last></author>
    <author><first>Austin J.</first><last>Brockmeier</last></author>
    <author><first>Georgios</first><last>Kontonatsios</last></author>
    <author><first>Tingting</first><last>Mu</last></author>
    <author><first>John Y.</first><last>Goulermas</last></author>
    <author><first>Jun'ichi</first><last>Tsujii</last></author>
    <author><first>Sophia</first><last>Ananiadou</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>991&#8211;1001</pages>
    <url>http://www.aclweb.org/anthology/E17-1093</url>
    <abstract>Descriptive document clustering aims to automatically discover groups of
	semantically related documents and to assign a meaningful label to characterise
	the content of each cluster. In this paper, we present a descriptive clustering
	approach that employs a distributed representation model, namely the paragraph
	vector model, to capture semantic similarities between documents and phrases.
	The proposed method uses a joint representation of  phrases and documents
	(i.e., a co-embedding) to automatically select a descriptive phrase that best
	represents each document cluster. We evaluate our method by comparing its
	performance to an existing state-of-the-art descriptive clustering method that
	also uses co-embedding but relies on a bag-of-words representation.  Results
	obtained on  benchmark datasets demonstrate that the paragraph vector-based
	method obtains superior performance over the existing approach in both
	identifying clusters and assigning appropriate descriptive labels to them.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sato-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1094">
    <title>SMARTies: Sentiment Models for Arabic Target entities</title>
    <author><first>Noura</first><last>Farra</last></author>
    <author><first>Kathy</first><last>McKeown</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1002&#8211;1013</pages>
    <url>http://www.aclweb.org/anthology/E17-1094</url>
    <abstract>We consider entity-level sentiment analysis in Arabic, a morphologically rich
	language with increasing resources. We present a system that is applied to
	complex posts written in response to Arabic newspaper articles.  Our goal is to
	identify important entity "targets" within the post along with the polarity
	expressed about each target. We achieve significant improvements over multiple
	baselines, demonstrating that the use of specific morphological representations
	improves the performance of identifying both important targets and their
	sentiment, and that the use of distributional semantic clusters further boosts
	performances for these representations, especially when richer linguistic
	resources are not available.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>farra-mckeown:2017:EACLlong</bibkey>
  </paper>

  <paper id="1095">
    <title>Exploring Convolutional Neural Networks for Sentiment Analysis of Spanish tweets</title>
    <author><first>Isabel</first><last>Segura-Bedmar</last></author>
    <author><first>Antonio</first><last>Quiros</last></author>
    <author><first>Paloma</first><last>Mart&#237;nez</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1014&#8211;1022</pages>
    <url>http://www.aclweb.org/anthology/E17-1095</url>
    <abstract>Spanish is the third-most used language on the internet, after English and 
	Chinese, with a total of 7.7% (more than 277 million of users) and a huge
	internet growth of more than 1,400%. However, most work on sentiment analysis
	has been focused on English. This paper describes a deep learning system for
	Spanish sentiment analysis. To the best of our knowledge, this is the first
	work that explores the use of a convolutional neural network  to polarity
	classification of Spanish tweets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>segurabedmar-quiros-martinez:2017:EACLlong</bibkey>
  </paper>

  <paper id="1096">
    <title>Contextual Bidirectional Long Short-Term Memory Recurrent Neural Network Language Models: A Generative Approach to Sentiment Analysis</title>
    <author><first>Amr</first><last>Mousa</last></author>
    <author><first>Bj&#246;rn</first><last>Schuller</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1023&#8211;1032</pages>
    <url>http://www.aclweb.org/anthology/E17-1096</url>
    <abstract>Traditional learning-based approaches to sentiment analysis of written text use
	the concept of bag-of-words or bag-of-n-grams, where a document is viewed as a
	set of terms or short combinations of terms disregarding grammar rules or word
	order. Novel approaches de-emphasize this concept and view the problem as a
	sequence classification problem. In this context, recurrent neural networks
	(RNNs) have achieved significant success. The idea is to use RNNs as
	discriminative binary classifiers to predict a positive or negative sentiment
	label at every word position then perform a type of pooling to get a
	sentence-level polarity. Here, we investigate a novel generative approach in
	which a separate probability distribution is estimated for every sentiment
	using language models (LMs) based on long short-term memory (LSTM) RNNs. We
	introduce a novel type of LM using a modified version of bidirectional LSTM
	(BLSTM) called contextual BLSTM (cBLSTM), where the probability of a word is
	estimated based on its full left and right contexts. Our approach is compared
	with a BLSTM binary classifier. Significant improvements are observed in
	classifying the IMDB movie review dataset. Further improvements are achieved
	via model combination.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mousa-schuller:2017:EACLlong</bibkey>
  </paper>

  <paper id="1097">
    <title>Large-scale Opinion Relation Extraction with Distantly Supervised Neural Network</title>
    <author><first>Changzhi</first><last>Sun</last></author>
    <author><first>Yuanbin</first><last>Wu</last></author>
    <author><first>Man</first><last>Lan</last></author>
    <author><first>Shiliang</first><last>Sun</last></author>
    <author><first>Qi</first><last>Zhang</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1033&#8211;1043</pages>
    <url>http://www.aclweb.org/anthology/E17-1097</url>
    <abstract>We investigate the task of open domain opinion relation extraction. Different
	from
	works on manually labeled corpus, we propose an efficient distantly supervised
	framework based on pattern matching and neural network classifiers. The
	patterns
	are designed to automatically generate training data, and the deep learning
	model is design to capture various lexical and syntactic features. The result
	algorithm is fast and scalable on large-scale corpus. We test the system on the
	Amazon online review dataset. The result shows that our model is able to
	achieve promising performances without any human annotations.
	Author{2}Affiliation</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sun-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1098">
    <title>Decoding with Finite-State Transducers on GPUs</title>
    <author><first>Arturo</first><last>Argueta</last></author>
    <author><first>David</first><last>Chiang</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1044&#8211;1052</pages>
    <url>http://www.aclweb.org/anthology/E17-1098</url>
    <abstract>Weighted finite  automata and transducers (including hidden Markov models and
	conditional random fields) are widely used in natural language processing (NLP)
	to perform tasks such as morphological analysis, part-of-speech tagging,
	chunking, named entity recognition, speech recognition, and others. 
	Parallelizing finite state algorithms on graphics processing units (GPUs) would
	benefit many areas of NLP. Although researchers have implemented GPU versions
	of basic graph algorithms, no work, to our knowledge, has been done on GPU
	algorithms for weighted finite automata. We introduce a GPU implementation of
	the Viterbi and forward-backward algorithm, achieving speedups of up to 4x over
	our serial implementations running on different computer architectures and
	3335x over widely used tools such as OpenFST.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>argueta-chiang:2017:EACLlong</bibkey>
  </paper>

  <paper id="1099">
    <title>Learning to Translate in Real-time with Neural Machine Translation</title>
    <author><first>Jiatao</first><last>Gu</last></author>
    <author><first>Graham</first><last>Neubig</last></author>
    <author><first>Kyunghyun</first><last>Cho</last></author>
    <author><first>Victor O.K.</first><last>Li</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1053&#8211;1062</pages>
    <url>http://www.aclweb.org/anthology/E17-1099</url>
    <abstract>Translating in real-time, a.k.a.simultaneous translation, outputs translation
	words before the input sentence ends,
	which is a challenging problem for conventional machine translation methods. 
	We propose a neural machine translation (NMT) framework for simultaneous
	translation in which an agent learns to make decisions on when to translate
	from the interaction with a pre-trained NMT environment.
	To trade off quality and delay, we extensively explore various targets for
	delay and design a method for beam-search applicable in the simultaneous MT
	setting. Experiments against state-of-the-art baselines on two language pairs
	demonstrate the efficacy of the proposed framework both quantitatively and
	qualitatively.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>gu-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1100">
    <title>A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions</title>
    <author><first>Antonio</first><last>Toral</last></author>
    <author><first>V&#237;ctor M.</first><last>S&#225;nchez-Cartagena</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1063&#8211;1073</pages>
    <url>http://www.aclweb.org/anthology/E17-1100</url>
    <abstract>We aim to shed light on the strengths and weaknesses of the newly introduced
	neural machine translation paradigm. To that end, we conduct a multifaceted
	evaluation in which we compare outputs produced by state-of-the-art neural
	machine translation and phrase-based machine translation systems for 9 language
	directions across a number of dimensions. Specifically, we measure the
	similarity of the outputs, their fluency and amount of reordering, the effect
	of sentence length and performance across different error categories. We find
	out that translations produced by neural machine translation systems are
	considerably different, more fluent and more accurate in terms of word order
	compared to those produced by phrase-based systems. Neural machine translation
	systems are also more accurate at producing inflected forms, but they perform
	poorly when translating very long sentences.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>toral-sanchezcartagena:2017:EACLlong</bibkey>
  </paper>

  <paper id="1101">
    <title>Personalized Machine Translation: Preserving Original Author Traits</title>
    <author><first>Ella</first><last>Rabinovich</last></author>
    <author><first>Raj Nath</first><last>Patel</last></author>
    <author><first>Shachar</first><last>Mirkin</last></author>
    <author><first>Lucia</first><last>Specia</last></author>
    <author><first>Shuly</first><last>Wintner</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1074&#8211;1084</pages>
    <url>http://www.aclweb.org/anthology/E17-1101</url>
    <abstract>The language that we produce reflects our personality, and various personal and
	demographic characteristics can be detected in natural language texts. We focus
	on one particular personal trait of the author, gender, and study how it is
	manifested in original texts and in translations. We show that author's gender
	has a powerful, clear signal in originals texts, but this signal is obfuscated
	in human and machine translation. We then propose simple domain-adaptation
	techniques that help retain the original gender traits in the translation,
	without harming the quality of the translation, thereby creating more
	personalized machine translation systems.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rabinovich-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1102">
    <title>Bilingual Lexicon Induction by Learning to Combine Word-Level and Character-Level Representations</title>
    <author><first>Geert</first><last>Heyman</last></author>
    <author><first>Ivan</first><last>Vuli&#x107;</last></author>
    <author><first>Marie-Francine</first><last>Moens</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1085&#8211;1095</pages>
    <url>http://www.aclweb.org/anthology/E17-1102</url>
    <abstract>We study the problem of bilingual lexicon induction (BLI) in a setting where
	some translation resources are available, but unknown translations are sought
	for certain, possibly domain-specific terminology. We frame BLI as a
	classification problem for which we design a neural network based
	classification architecture composed of recurrent long short-term memory and
	deep feed forward networks. 
	The results show that word- and character-level representations each improve
	state-of-the-art results for BLI, and the best results are obtained by
	exploiting the synergy between these word- and character-level representations
	in the classification model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>heyman-vulic-moens:2017:EACLlong</bibkey>
  </paper>

  <paper id="1103">
    <title>Grouping business news stories based on salience of named entities</title>
    <author><first>Llorenc</first><last>Escoter</last></author>
    <author><first>Lidia</first><last>Pivovarova</last></author>
    <author><first>Mian</first><last>Du</last></author>
    <author><first>Anisia</first><last>Katinskaia</last></author>
    <author><first>Roman</first><last>Yangarber</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1096&#8211;1106</pages>
    <url>http://www.aclweb.org/anthology/E17-1103</url>
    <abstract>In news aggregation systems focused on broad news domains, certain stories may
	appear in multiple articles. Depending on the relative importance of the story,
	the number of versions can reach dozens or hundreds within a day. The text in
	these versions may be nearly identical or quite different. Linking multiple
	versions of a story into a single group brings several important benefits to
	the end-user&#8211;reducing the cognitive load on the reader, as well as signaling
	the relative importance of the story. We present a grouping algorithm, and
	explore several vector-based representations of input documents: from a
	baseline using keywords, to a method using salience&#8211;a measure of importance
	of named entities in the text. We demonstrate that features beyond keywords
	yield substantial improvements, verified on a manually-annotated corpus of
	business news stories.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>escoter-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1104">
    <title>Very Deep Convolutional Networks for Text Classification</title>
    <author><first>Alexis</first><last>Conneau</last></author>
    <author><first>Holger</first><last>Schwenk</last></author>
    <author><first>Lo&#239;c</first><last>Barrault</last></author>
    <author><first>Yann</first><last>Lecun</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1107&#8211;1116</pages>
    <url>http://www.aclweb.org/anthology/E17-1104</url>
    <abstract>The dominant approach for many NLP tasks are recurrent neural networks, in
	particular LSTMs, and convolutional neural networks. However, these
	architectures are rather shallow in comparison to the deep convolutional
	networks which have pushed the state-of-the-art in computer vision.  We present
	a new
	architecture (VDCNN) for text processing which operates directly at the
	character level
	and uses only small convolutions and pooling operations.
	We are able to show that the performance of this model increases with the
	depth: using up to 29 convolutional layers, we report improvements
	over the state-of-the-art on several public text classification tasks.               
	To the
	best of our knowledge, this is the first time that very deep convolutional nets
	have been applied to text processing.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>conneau-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1105">
    <title>"PageRank" for Argument Relevance</title>
    <author><first>Henning</first><last>Wachsmuth</last></author>
    <author><first>Benno</first><last>Stein</last></author>
    <author><first>Yamen</first><last>Ajjour</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1117&#8211;1127</pages>
    <url>http://www.aclweb.org/anthology/E17-1105</url>
    <abstract>Future search engines are expected to deliver pro and con arguments in response
	to queries on controversial topics. While argument mining is now in the focus
	of research, the question of how to retrieve the relevant arguments remains
	open. This paper proposes a radical model to assess relevance objectively at
	web scale: the relevance of an argument's conclusion is decided by what other
	arguments reuse it as a premise. We build an argument graph for this model that
	we analyze with a recursive weighting scheme, adapting key ideas of PageRank.
	In experiments on a large ground-truth argument graph, the resulting relevance
	scores correlate with human average judgments. We outline what natural language
	challenges must be faced at web scale in order to stepwise bring argument
	relevance to web search engines.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wachsmuth-stein-ajjour:2017:EACLlong</bibkey>
  </paper>

  <paper id="1106">
    <title>Predicting Counselor Behaviors in Motivational Interviewing Encounters</title>
    <author><first>Ver&#243;nica</first><last>P&#233;rez-Rosas</last></author>
    <author><first>Rada</first><last>Mihalcea</last></author>
    <author><first>Kenneth</first><last>Resnicow</last></author>
    <author><first>Satinder</first><last>Singh</last></author>
    <author><first>Lawrence</first><last>Ann</last></author>
    <author><first>Kathy J.</first><last>Goggin</last></author>
    <author><first>Delwyn</first><last>Catley</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1128&#8211;1137</pages>
    <url>http://www.aclweb.org/anthology/E17-1106</url>
    <abstract>As the number of people receiving psycho-therapeutic treatment increases, the
	automatic evaluation of counseling practice arises as an important challenge in
	the clinical domain. In this paper, we address the automatic evaluation of
	counseling performance by analyzing counselors' language during their
	interaction with clients. In particular, we present a model towards the
	automation of Motivational Interviewing (MI) coding, which is the current gold
	standard to evaluate MI counseling. First, we build a dataset of hand labeled
	MI encounters; second, we use text-based methods to extract and analyze
	linguistic patterns associated with counselor behaviors; and third, we develop
	an automatic system to predict these behaviors. We introduce a new set of
	features based on semantic information and syntactic patterns, and show that
	they lead to accuracy figures of up to 90%, which represent a significant
	improvement with respect to features used in the past.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>perezrosas-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1107">
    <title>Authorship Attribution Using Text Distortion</title>
    <author><first>Efstathios</first><last>Stamatatos</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1138&#8211;1149</pages>
    <url>http://www.aclweb.org/anthology/E17-1107</url>
    <abstract>Authorship attribution is associated with important applications in forensics
	and humanities research. A crucial point in this field is to quantify the
	personal style of writing, ideally in a way that is not affected by changes in
	topic or genre. In this paper, we present a novel method that enhances
	authorship attribution effectiveness by introducing a text distortion step
	before extracting stylometric measures. The proposed method attempts to mask
	topic-specific information that is not related to the personal style of
	authors. Based on experiments on two main tasks in authorship attribution,
	closed-set attribution and authorship verification, we demonstrate that the
	proposed approach can enhance existing methods especially under cross-topic
	conditions, where the training and test corpora do not match in topic.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>stamatatos:2017:EACLlong</bibkey>
  </paper>

  <paper id="1108">
    <title>Structured Learning for Temporal Relation Extraction from Clinical Records</title>
    <author><first>Artuur</first><last>Leeuwenberg</last></author>
    <author><first>Marie-Francine</first><last>Moens</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1150&#8211;1158</pages>
    <url>http://www.aclweb.org/anthology/E17-1108</url>
    <abstract>We propose a scalable structured learning model that jointly predicts temporal
	relations between events and temporal expressions (TLINKS), and the relation
	between these events and the document creation time (DCTR). We employ a
	structured perceptron, together with integer linear programming constraints for
	document-level inference during training and prediction to exploit relational
	properties of temporality, together with global learning of the relations at
	the document level. Moreover, this study gives insights in the results of
	integrating constraints for temporal relation extraction when using structured
	learning and prediction. Our best system outperforms the state-of-the art on
	both the CONTAINS TLINK task, and the DCTR task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>leeuwenberg-moens:2017:EACLlong</bibkey>
  </paper>

  <paper id="1109">
    <title>Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection</title>
    <author><first>Shweta</first><last>Yadav</last></author>
    <author><first>Asif</first><last>Ekbal</last></author>
    <author><first>Sriparna</first><last>Saha</last></author>
    <author><first>Pushpak</first><last>Bhattacharyya</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1159&#8211;1170</pages>
    <url>http://www.aclweb.org/anthology/E17-1109</url>
    <abstract>Text mining has drawn significant attention in recent past due to the rapid
	growth
	in biomedical and clinical records. Entity extraction is one of the fundamental
	components for biomedical text mining. In this paper, we propose a novel
	approach of feature selection for entity extraction that exploits the concept
	of deep learning and Particle Swarm Optimization (PSO). The system utilizes
	word embedding features along with several other features extracted by studying
	the properties of the datasets. We obtain an interesting observation that
	compact word embedding features as determined by PSO are more effective
	compared to the entire word embedding feature set for entity extraction. The
	proposed system is evaluated on three benchmark biomedical datasets such as
	GENIA, GENETAG, and AiMed. The effectiveness of the proposed approach is
	evident with significant performance gains over the baseline models as well as
	the other existing systems. We observe improvements of 7.86%, 5.27% and 7.25%
	F-measure points over the baseline models for GENIA, GENETAG, and AiMed dataset
	respectively.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yadav-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1110">
    <title>Distant Supervision for Relation Extraction beyond the Sentence Boundary</title>
    <author><first>Chris</first><last>Quirk</last></author>
    <author><first>Hoifung</first><last>Poon</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1171&#8211;1182</pages>
    <url>http://www.aclweb.org/anthology/E17-1110</url>
    <abstract>The growing demand for structured knowledge has led to great interest in
	relation extraction, especially in cases with limited supervision. However,
	existing distance supervision approaches only extract relations expressed in
	single sentences. In general, cross-sentence relation extraction is
	under-explored, even in the supervised-learning setting. In this paper, we
	propose the first approach for applying distant supervision to cross- sentence
	relation extraction. At the core of our approach is a graph representation that
	can incorporate both standard dependencies and discourse relations, thus
	providing a unifying way to model relations within and across sentences. We
	extract features from multiple paths in this graph, increasing accuracy and
	robustness when confronted with linguistic variation and analysis error.
	Experiments on an important extraction task for precision medicine show that
	our approach can learn an accurate cross-sentence extractor, using only a small
	existing knowledge base and unlabeled text from biomedical research articles.
	Compared to the existing distant supervision paradigm, our approach extracted
	twice as many relations at similar precision, thus demonstrating the prevalence
	of cross-sentence relations and the promise of our approach.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>quirk-poon:2017:EACLlong</bibkey>
  </paper>

  <paper id="1111">
    <title>Noise Mitigation for Neural Entity Typing and Relation Extraction</title>
    <author><first>Yadollah</first><last>Yaghoobzadeh</last></author>
    <author><first>Heike</first><last>Adel</last></author>
    <author><first>Hinrich</first><last>Sch&#252;tze</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1183&#8211;1194</pages>
    <url>http://www.aclweb.org/anthology/E17-1111</url>
    <abstract>In this paper, we address two different types of noise in information
	extraction models: noise from distant supervision and noise from pipeline input
	features. Our
	target tasks are entity typing and relation extraction. For the first noise
	type, we introduce multi-instance multi-label learning algorithms using neural
	network models, and apply them to fine-grained entity typing for the first
	time. Our model outperforms the state-of-the-art supervised approach which uses
	global embeddings of entities. For the second noise type, we propose ways to
	improve the integration of noisy entity type predictions into relation
	extraction. Our experiments show that  probabilistic predictions are more
	robust than discrete predictions and that joint training of the two tasks
	performs best.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yaghoobzadeh-adel-schutze:2017:EACLlong</bibkey>
  </paper>

  <paper id="1112">
    <title>Analyzing Semantic Change in Japanese Loanwords</title>
    <author><first>Hiroya</first><last>Takamura</last></author>
    <author><first>Ryo</first><last>Nagata</last></author>
    <author><first>Yoshifumi</first><last>Kawasaki</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1195&#8211;1204</pages>
    <url>http://www.aclweb.org/anthology/E17-1112</url>
    <abstract>We analyze semantic changes in loanwords from English that are used in Japanese
	(Japanese loanwords). 
	Specifically, we create word embeddings of English and Japanese and map the
	Japanese embeddings into the English space so that we can calculate the
	similarity of each Japanese word and each English word.
	We then attempt to find loanwords that are semantically different from their
	original, see if known meaning changes are correctly captured, and show the
	possibility of using our methodology in language education.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>takamura-nagata-kawasaki:2017:EACLlong</bibkey>
  </paper>

  <paper id="1113">
    <title>Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multi-lingual wordlists</title>
    <author><first>Gerhard</first><last>J&#228;ger</last></author>
    <author><first>Johann-Mattis</first><last>List</last></author>
    <author><first>Pavel</first><last>Sofroniev</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1205&#8211;1216</pages>
    <url>http://www.aclweb.org/anthology/E17-1113</url>
    <abstract>Most current approaches in phylogenetic linguistics require as input
	multilingual word lists partitioned into sets of etymologically related words
	(cognates). Cognate identification is so far done manually by experts, which is
	time consuming and as of yet only available for a small number of well-studied
	language families.  Automatizing this step will greatly expand the empirical
	scope of phylogenetic methods in linguistics, as raw wordlists (in phonetic
	transcription) are much easier to obtain than wordlists in which cognate words
	have been fully identified and annotated, even for under-studied languages.  A
	couple of different methods have been proposed in the past, but they are either
	disappointing regarding their performance or not applicable to larger datasets.
	 Here we present a new approach that uses support vector machines to unify
	different state-of-the-art methods for phonetic alignment and cognate detection
	within a single framework. Training and evaluating these method on a
	typologically broad collection of gold-standard data shows it to be superior to
	the existing state of the art.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jager-list-sofroniev:2017:EACLlong</bibkey>
  </paper>

  <paper id="1114">
    <title>A Multi-task Approach to Predict Likability of Books</title>
    <author><first>Suraj</first><last>Maharjan</last></author>
    <author><first>John</first><last>Arevalo</last></author>
    <author><first>Manuel</first><last>Montes</last></author>
    <author><first>Fabio A.</first><last>Gonz&#225;lez</last></author>
    <author><first>Thamar</first><last>Solorio</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1217&#8211;1227</pages>
    <url>http://www.aclweb.org/anthology/E17-1114</url>
    <abstract>We investigate the value of feature engineering and neural network models for
	predicting successful writing. Similar to previous work, we treat this as a
	binary classification task and explore new strategies to automatically learn
	representations from book contents. We evaluate our feature set on two
	different corpora created from Project Gutenberg books. The first presents a
	novel approach for generating the gold standard labels for the task and the
	other is based on prior research. Using a combination of hand-crafted and
	recurrent neural network learned representations in a dual learning setting, we
	obtain the best performance of 73.50% weighted F1-score.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>maharjan-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1115">
    <title>A Data-Oriented Model of Literary Language</title>
    <author><first>Andreas</first><last>van Cranenburgh</last></author>
    <author><first>Rens</first><last>Bod</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1228&#8211;1238</pages>
    <url>http://www.aclweb.org/anthology/E17-1115</url>
    <abstract>We consider the task of predicting how literary a text is, with a gold standard
	from human ratings. Aside from a standard bigram baseline, we apply rich
	syntactic tree fragments, mined from the training set, and a series of
	hand-picked features. Our model is the first to distinguish degrees of highly
	and less literary novels using a variety of lexical and syntactic features, and
	explains 76.0 % of the variation in literary ratings.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>vancranenburgh-bod:2017:EACLlong</bibkey>
  </paper>

  <paper id="1116">
    <title>Aye or naw, whit dae ye hink? Scottish independence and linguistic identity on social media</title>
    <author><first>Philippa</first><last>Shoemark</last></author>
    <author><first>Debnil</first><last>Sur</last></author>
    <author><first>Luke</first><last>Shrimpton</last></author>
    <author><first>Iain</first><last>Murray</last></author>
    <author><first>Sharon</first><last>Goldwater</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1239&#8211;1248</pages>
    <url>http://www.aclweb.org/anthology/E17-1116</url>
    <abstract>Political surveys have indicated a relationship between a sense of Scottish
	identity and voting decisions in the 2014 Scottish Independence Referendum.
	Identity is often reflected in language use, suggesting the intuitive
	hypothesis that individuals who support Scottish independence are more likely
	to use distinctively Scottish words than those who oppose it. In the first
	large-scale study of sociolinguistic variation on social media in the UK, we
	identify distinctively Scottish terms in a data-driven way, and find that these
	terms are indeed used at a higher rate by users of pro-independence hashtags
	than by users of anti-independence hashtags.  However, we also find that in
	general people are less likely to use distinctively Scottish words in tweets
	with referendum-related hashtags than in their general Twitter activity. We
	attribute this difference to style shifting relative to audience, aligning with
	previous work showing that Twitter users tend to use fewer local variants when
	addressing a broader audience.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shoemark-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1117">
    <title>What Do Recurrent Neural Network Grammars Learn About Syntax?</title>
    <author><first>Adhiguna</first><last>Kuncoro</last></author>
    <author><first>Miguel</first><last>Ballesteros</last></author>
    <author><first>Lingpeng</first><last>Kong</last></author>
    <author><first>Chris</first><last>Dyer</last></author>
    <author><first>Graham</first><last>Neubig</last></author>
    <author><first>Noah A.</first><last>Smith</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1249&#8211;1258</pages>
    <url>http://www.aclweb.org/anthology/E17-1117</url>
    <abstract>Recurrent neural network grammars (RNNG) are a recently proposed probablistic
	generative modeling family for natural language. They show state-of-the-art
	language modeling and parsing performance. We investigate what information they
	learn, from a linguistic perspective, through various ablations to the model
	and the data, and by augmenting the model with an attention mechanism (GA-RNNG)
	to enable closer inspection. We find that explicit modeling of composition is
	crucial for achieving the best performance. Through the attention mechanism, we
	find that headedness plays a central role in phrasal representation (with the
	model's latent attention largely agreeing with predictions made by hand-crafted
	head rules, albeit with some important differences). By training grammars
	without nonterminal labels, we find that phrasal representations depend
	minimally on nonterminals, providing support for the endocentricity hypothesis.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kuncoro-EtAl:2017:EACLlong</bibkey>
  </paper>

  <paper id="1118">
    <title>Incremental Discontinuous Phrase Structure Parsing with the GAP Transition</title>
    <author><first>Maximin</first><last>Coavoux</last></author>
    <author><first>Benoit</first><last>Crabb&#233;</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1259&#8211;1270</pages>
    <url>http://www.aclweb.org/anthology/E17-1118</url>
    <abstract>This article introduces a novel transition system for discontinuous lexicalized
	constituent parsing called SR-GAP. It is an extension of the shift-reduce
	algorithm with an additional gap transition. Evaluation on two German treebanks
	shows that SR-GAP outperforms the previous best transition-based discontinuous
	parser (Maier, 2015) by a large margin (it is notably twice as accurate on the
	prediction of discontinuous constituents), and is competitive with the state of
	the art (Fern&#225;ndez-Gonz&#225;lez and Martins, 2015). As a side contribution, we
	adapt span features (Hall et al., 2014) to discontinuous parsing.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>coavoux-crabbe:2017:EACLlong</bibkey>
  </paper>

  <paper id="1119">
    <title>Neural Architectures for Fine-grained Entity Type Classification</title>
    <author><first>Sonse</first><last>Shimaoka</last></author>
    <author><first>Pontus</first><last>Stenetorp</last></author>
    <author><first>Kentaro</first><last>Inui</last></author>
    <author><first>Sebastian</first><last>Riedel</last></author>
    <booktitle>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers</booktitle>
    <month>April</month>
    <year>2017</year>
    <address>Valencia, Spain</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1271&#8211;1280</pages>
    <url>http://www.aclweb.org/anthology/E17-1119</url>
    <abstract>In this work, we investigate several neural network architectures for
	fine-grained entity type classification and make three key contributions. 
	Despite being a natural comparison and addition, previous work on attentive
	neural architectures have not considered hand-crafted features and we combine
	these with learnt features and establish that they complement each other. 
	Additionally, through quantitative analysis we establish that the attention
	mechanism learns to attend over syntactic heads and the phrase containing the
	mention, both of which are known to be strong hand-crafted features for our
	task.  We introduce parameter sharing between labels through a hierarchical
	encoding method, that in low-dimensional projections show clear clusters for
	each type hierarchy.  Lastly, despite using the same evaluation dataset, the
	literature frequently compare models trained using different data.  We
	demonstrate that the choice of training data has a drastic impact on
	performance, which decreases by as much as 9.85% loose micro F1 score for a
	previously proposed method.  Despite this discrepancy, our best model achieves
	state-of-the-art results with 75.36% loose micro F1 score on the
	well-established Figer (GOLD) dataset and we report the best results for models
	trained using publicly available data for the OntoNotes dataset with 64.93%
	loose micro F1 score.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shimaoka-EtAl:2017:EACLlong</bibkey>
  </paper>

</volume>

