<?xml version="1.0" encoding="UTF-8" ?>
<volume id="K17">
  <paper id="1000">
    <title>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</title>
    <editor>Roger Levy</editor>
    <editor>Lucia Specia</editor>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://aclweb.org/anthology/K17-1</url>
    <bibtype>book</bibtype>
    <bibkey>CoNLL:2017</bibkey>
  </paper>

  <paper id="1001">
    <title>Should Neural Network Architecture Reflect Linguistic Structure?</title>
    <author><first>Chris</first><last>Dyer</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1</pages>
    <url>http://aclweb.org/anthology/K17-1001</url>
    <abstract>I explore the hypothesis that conventional neural network models (e.g.,
	recurrent neural networks) are incorrectly biased for making linguistically
	sensible generalizations when learning, and that a better class of models is
	based on architectures that reflect hierarchical structures for which
	considerable behavioral evidence exists. I focus on the problem of modeling and
	representing the meanings of sentences. On the generation front, I introduce
	recurrent neural network grammars (RNNGs), a joint, generative model of
	phrase-structure trees and sentences. RNNGs operate via a recursive syntactic
	process reminiscent of probabilistic context-free grammar generation, but
	decisions are parameterized using RNNs that condition on the entire (top-down,
	left-to-right) syntactic derivation history, thus relaxing context-free
	independence assumptions, while retaining a bias toward explaining decisions
	via "syntactically local" conditioning contexts. Experiments show that RNNGs
	obtain better results in generating language than models that don’t exploit
	linguistic structure. On the representation front, I explore unsupervised
	learning of syntactic structures based on distant semantic supervision using a
	reinforcement-learning algorithm. The learner seeks a syntactic structure that
	provides a compositional architecture that produces a good representation for a
	downstream semantic task. Although the inferred structures are quite different
	from traditional syntactic analyses, the performance on the downstream tasks
	surpasses that of systems that use sequential RNNs and tree-structured RNNs
	based on treebank dependencies. This is joint work with Adhi Kuncoro, Dani
	Yogatama, Miguel Ballesteros, Phil Blunsom, Ed Grefenstette, Wang Ling, and
	Noah A. Smith.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dyer:2017:CoNLL</bibkey>
  </paper>

  <paper id="1002">
    <title>Rational Distortions of Learners' Linguistic Input</title>
    <author><first>Naomi</first><last>Feldman</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>2</pages>
    <url>http://aclweb.org/anthology/K17-1002</url>
    <abstract>Language acquisition can be modeled as a statistical inference problem:
	children use sentences and sounds in their input to infer linguistic structure.
	 However, in many cases, children learn from data whose statistical structure
	is distorted relative to the language they are learning.  Such distortions can
	arise either in the input itself, or as a result of children's immature
	strategies for encoding their input.  This work examines several cases in which
	the statistical structure of children's input differs from the language being
	learned.  Analyses show that these distortions of the input can be accounted
	for with a statistical learning framework by carefully considering the
	inference problems that learners solve during language acquisition</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>feldman:2017:CoNLL</bibkey>
  </paper>

  <paper id="1003">
    <title>Exploring the Syntactic Abilities of RNNs with Multi-task Learning</title>
    <author><first>&#201;mile</first><last>Enguehard</last></author>
    <author><first>Yoav</first><last>Goldberg</last></author>
    <author><first>Tal</first><last>Linzen</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>3&#8211;14</pages>
    <url>http://aclweb.org/anthology/K17-1003</url>
    <abstract>Recent work has explored the syntactic abilities of RNNs using the subject-verb
	agreement task, which diagnoses sensitivity to sentence structure. RNNs
	performed this task well in common cases, but faltered in complex sentences
	(Linzen et al., 2016). We test whether these errors are due to inherent
	limitations of the architecture or to the relatively indirect supervision
	provided by most agreement dependencies in a corpus. We trained a single RNN to
	perform both the agreement task and an additional task, either CCG supertagging
	or language modeling. Multi-task training led to significantly lower error
	rates, in particular on complex sentences, suggesting that RNNs have the
	ability to evolve more sophisticated syn- tactic representations than shown
	before. We also show that easily available agreement training data can improve
	performance on other syntactic tasks, in particular when only a limited amount
	of training data is available for those tasks. The multi-task paradigm can also
	be leveraged to inject grammatical knowledge into language models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>enguehard-goldberg-linzen:2017:CoNLL</bibkey>
  </paper>

  <paper id="1004">
    <title>The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task</title>
    <author><first>Roy</first><last>Schwartz</last></author>
    <author><first>Maarten</first><last>Sap</last></author>
    <author><first>Ioannis</first><last>Konstas</last></author>
    <author><first>Leila</first><last>Zilles</last></author>
    <author><first>Yejin</first><last>Choi</last></author>
    <author><first>Noah A.</first><last>Smith</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>15&#8211;25</pages>
    <url>http://aclweb.org/anthology/K17-1004</url>
    <abstract>A writer's style depends not just on personal traits but also on her intent and
	mental state. In this paper, we show how variants of the same writing task can
	lead to measurable differences in writing style. We present a case study based
	on the story cloze task (Mostafazadeh et al., 2016a), where annotators were
	assigned similar writing tasks with different constraints: (1) writing an
	entire story, (2) adding a story ending for a given story context, and (3)
	adding an incoherent ending to a story. We show that a simple linear classifier
	informed by stylistic features is able to successfully distinguish among the
	three cases, without even looking at the story context. In addition, combining
	our stylistic features with language model predictions reaches state of the art
	performance on the story cloze challenge. Our results demonstrate that
	different task framings can dramatically affect the way people write.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>schwartz-EtAl:2017:CoNLL</bibkey>
  </paper>

  <paper id="1005">
    <title>Parsing for Grammatical Relations via Graph Merging</title>
    <author><first>Weiwei</first><last>Sun</last></author>
    <author><first>Yantao</first><last>Du</last></author>
    <author><first>Xiaojun</first><last>Wan</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>26&#8211;35</pages>
    <url>http://aclweb.org/anthology/K17-1005</url>
    <abstract>This paper is concerned with building deep grammatical relation (GR) analysis
	using data-driven approach. To deal with this problem, we propose graph
	merging, a new perspective, for building flexible dependency graphs:
	Constructing complex graphs via constructing simple subgraphs. We discuss two
	key problems in this perspective: (1) how to decompose a complex graph into
	simple subgraphs, and (2) how to combine subgraphs into a coherent complex
	graph. Experiments demonstrate the effectiveness of graph merging.
	Our parser reaches state-of-the-art performance and is significantly better
	than two transition-based parsers.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sun-du-wan:2017:CoNLL</bibkey>
  </paper>

  <paper id="1006">
    <title>Leveraging Eventive Information for Better Metaphor Detection and Classification</title>
    <author><first>I-Hsuan</first><last>Chen</last></author>
    <author><first>Yunfei</first><last>Long</last></author>
    <author><first>Qin</first><last>Lu</last></author>
    <author><first>Chu-Ren</first><last>Huang</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>36&#8211;46</pages>
    <url>http://aclweb.org/anthology/K17-1006</url>
    <abstract>Metaphor detection has been both challenging and rewarding in natural language
	processing applications. This study offers a new approach based on eventive
	information in detecting metaphors by leveraging the Chinese writing system,
	which is a culturally bound ontological system organized according to the basic
	concepts represented by radicals. As such, the information represented is
	available in all Chinese text without pre-processing. Since metaphor detection
	is another culturally based conceptual representation, we hypothesize that
	sub-textual information can facilitate the identification and classification of
	the types of metaphoric events denoted in Chinese text. We propose a set of
	syntactic conditions crucial to event structures to improve the model based on
	the classification of radical groups. With the proposed syntactic conditions,
	the model achieves a performance of 0.8859 in terms of F-scores, making 1.7% of
	improvement than the same classifier with only Bag-of-word features. Results
	show that eventive information can improve the effectiveness of metaphor
	detection. Event information is rooted in every language, and thus this
	approach has a high potential to be applied to metaphor detection in other
	languages.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chen-EtAl:2017:CoNLL1</bibkey>
  </paper>

  <paper id="1007">
    <title>Collaborative Partitioning for Coreference Resolution</title>
    <author><first>Olga</first><last>Uryupina</last></author>
    <author><first>Alessandro</first><last>Moschitti</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>47&#8211;57</pages>
    <url>http://aclweb.org/anthology/K17-1007</url>
    <abstract>This paper presents a collaborative partitioning algorithm&#8211;-a novel
	ensemble-based approach to coreference resolution. Starting from the
	all-singleton partition, we search for a solution close to the ensemble's
	outputs in terms of a task-specific similarity measure. Our approach assumes a
	loose integration of individual components of the ensemble and can therefore
	combine arbitrary coreference resolvers, regardless of their models. 
	Our experiments on the CoNLL dataset show that collaborative partitioning
	yields results superior to those attained by the individual components, for
	ensembles of both strong and weak systems. Moreover, by applying the
	collaborative partitioning algorithm on top of three state-of-the-art
	resolvers, we obtain the best coreference performance reported so far in the
	literature (MELA v08 score of 64.47).</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>uryupina-moschitti:2017:CoNLL</bibkey>
  </paper>

  <paper id="1008">
    <title>Named Entity Disambiguation for Noisy Text</title>
    <author><first>Yotam</first><last>Eshel</last></author>
    <author><first>Noam</first><last>Cohen</last></author>
    <author><first>Kira</first><last>Radinsky</last></author>
    <author><first>Shaul</first><last>Markovitch</last></author>
    <author><first>Ikuya</first><last>Yamada</last></author>
    <author><first>Omer</first><last>Levy</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>58&#8211;68</pages>
    <url>http://aclweb.org/anthology/K17-1008</url>
    <abstract>We address the task of Named Entity Disambiguation (NED) for noisy text. 
	We present WikilinksNED, a large-scale NED dataset of text fragments from the
	web, which is significantly noisier and more challenging than existing
	news-based datasets. To capture the limited and noisy local context surrounding
	each mention, we design a neural model and train it with a novel method for
	sampling informative negative examples. We also describe a new way of
	initializing word and entity embeddings that significantly improves
	performance. Our model significantly outperforms existing state-of-the-art
	methods on WikilinksNED while achieving comparable performance on a smaller
	newswire
	dataset.
	Author{3}Affiliation</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>eshel-EtAl:2017:CoNLL</bibkey>
  </paper>

  <paper id="1009">
    <title>Tell Me Why: Using Question Answering as Distant Supervision for Answer Justification</title>
    <author><first>Rebecca</first><last>Sharp</last></author>
    <author><first>Mihai</first><last>Surdeanu</last></author>
    <author><first>Peter</first><last>Jansen</last></author>
    <author><first>Marco A.</first><last>Valenzuela-Esc&#225;rcega</last></author>
    <author><first>Peter</first><last>Clark</last></author>
    <author><first>Michael</first><last>Hammond</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>69&#8211;79</pages>
    <url>http://aclweb.org/anthology/K17-1009</url>
    <abstract>For many applications of question answering (QA), being able to explain why a
	given model chose an answer is critical.  However, the lack of labeled data for
	 answer justifications makes learning this difficult and expensive.  Here we
	propose an approach that uses answer ranking as distant supervision for
	learning how to select informative justifications, where justifications serve
	as inferential connections between the question and the correct answer while
	often containing little lexical overlap with either.
	We propose a neural network architecture for QA that reranks answer
	justifications as an intermediate (and human-interpretable) step in answer
	selection. Our approach is informed by a set of features designed to combine
	both learned representations and explicit features to capture the connection
	between questions, answers, and answer justifications.
	 We show that with this end-to-end approach we are able to significantly
	improve upon a strong IR baseline in both justification ranking (+9% rated
	highly relevant) and answer selection (+6% P$@$1).</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sharp-EtAl:2017:CoNLL</bibkey>
  </paper>

  <paper id="1010">
    <title>Learning What is Essential in Questions</title>
    <author><first>Daniel</first><last>Khashabi</last></author>
    <author><first>Tushar</first><last>Khot</last></author>
    <author><first>Ashish</first><last>Sabharwal</last></author>
    <author><first>Dan</first><last>Roth</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>80&#8211;89</pages>
    <url>http://aclweb.org/anthology/K17-1010</url>
    <abstract>Question answering (QA) systems are easily  distracted by  irrelevant or 
	redundant words in questions, especially when faced with  long                       
	  or 
	multi-sentence                          questions  in difficult  domains. This       
	paper 
	introduces and
	 studies  the  notion  of essential  question  terms with  the goal  of 
	improving such QA  solvers. We                          illustrate the             
	importance            of               
	  essential 
	question  terms  by showing  that  humans’  ability  to  answer questions
	drops significantly when essential  terms  are eliminated  from  questions.We
	then develop a classifier that reliably (90%  mean  average  precision) 
	identifies and ranks essential terms in questions. Finally, we use the
	classifier to demonstrate that                          the  notion  of  question 
	term 
	essentiality 
	allows state-of-the-art  QA  solver for  elementary-level  science  questions 
	to make better and more informed decisions,improving performance by up to 5%.We
	 also  introduce  a  new  dataset  of  over 2,200 crowd-sourced essential terms
	annotated science questions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>khashabi-EtAl:2017:CoNLL</bibkey>
  </paper>

  <paper id="1011">
    <title>Top-Rank Enhanced Listwise Optimization for Statistical Machine Translation</title>
    <author><first>Huadong</first><last>Chen</last></author>
    <author><first>Shujian</first><last>Huang</last></author>
    <author><first>David</first><last>Chiang</last></author>
    <author><first>XIN-YU</first><last>DAI</last></author>
    <author><first>Jiajun</first><last>CHEN</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>90&#8211;99</pages>
    <url>http://aclweb.org/anthology/K17-1011</url>
    <abstract>Pairwise ranking methods are the most widely used discriminative training
	approaches for structure prediction problems in natural language processing
	(NLP). Decomposing the problem of ranking hypotheses into pairwise comparisons
	enables simple and efficient solutions. However, neglecting the global ordering
	of the hypothesis list may hinder learning. We propose a listwise learning
	framework for structure prediction problems such as machine translation. Our
	framework directly models the entire translation list’s ordering to learn
	parameters which may better fit the given listwise samples. Furthermore, we
	propose top-rank enhanced loss functions, which are more sensitive to ranking
	errors at higher positions. Experiments on a large-scale Chinese-English
	translation task show that both our listwise learning framework and top-rank
	enhanced listwise losses lead to significant improvements in translation
	quality.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chen-EtAl:2017:CoNLL2</bibkey>
  </paper>

  <paper id="1012">
    <title>Embedding Words and Senses Together via Joint Knowledge-Enhanced Training</title>
    <author><first>Massimiliano</first><last>Mancini</last></author>
    <author><first>Jose</first><last>Camacho-Collados</last></author>
    <author><first>Ignacio</first><last>Iacobacci</last></author>
    <author><first>Roberto</first><last>Navigli</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>100&#8211;111</pages>
    <url>http://aclweb.org/anthology/K17-1012</url>
    <abstract>Word embeddings are widely used in Natural Language Processing, mainly due to
	their success in capturing semantic information from massive corpora. However,
	their creation process does not allow the different meanings of a word to be
	automatically separated, as it conflates them into a single vector. We address
	this issue by proposing a new model which learns word and sense embeddings
	jointly. Our model exploits large corpora and knowledge from semantic networks
	in order to produce a unified vector space of word and sense embeddings. We
	evaluate the main features of our approach both qualitatively and
	quantitatively in a variety of tasks, highlighting the advantages of the
	proposed method in comparison to state-of-the-art word- and sense-based models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mancini-EtAl:2017:CoNLL</bibkey>
  </paper>

  <paper id="1013">
    <title>Automatic Selection of Context Configurations for Improved Class-Specific Word Representations</title>
    <author><first>Ivan</first><last>Vuli&#x107;</last></author>
    <author><first>Roy</first><last>Schwartz</last></author>
    <author><first>Ari</first><last>Rappoport</last></author>
    <author><first>Roi</first><last>Reichart</last></author>
    <author><first>Anna</first><last>Korhonen</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>112&#8211;122</pages>
    <url>http://aclweb.org/anthology/K17-1013</url>
    <abstract>This paper is concerned with identifying contexts useful for training word
	representation models for different word classes such as adjectives (A), verbs
	(V), and nouns (N). We introduce a simple yet effective framework for an
	automatic selection of class-specific context configurations. We construct a
	context configuration space based on universal dependency relations between
	words, and efficiently search this space with an adapted beam search algorithm.
	In word similarity tasks for each word class, we show that our framework is
	both effective and efficient. Particularly, it improves the Spearman's rho
	correlation with human scores on SimLex-999 over the best previously proposed
	class-specific contexts by 6 (A), 6 (V) and 5 (N) rho points. With our selected
	context configurations, we train on only 14% (A), 26.2% (V), and 33.6% (N) of
	all dependency-based contexts, resulting in a reduced training time. Our
	results generalise: we show that the configurations our algorithm learns for
	one English training setup outperform previously proposed context types in
	another training setup for English. Moreover, basing the configuration space on
	universal dependencies, it is possible to transfer the learned configurations
	to German and Italian. We also demonstrate improved per-class results over
	other context types in these two languages..</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>vulic-EtAl:2017:CoNLL</bibkey>
  </paper>

  <paper id="1014">
    <title>Modeling Context Words as Regions: An Ordinal Regression Approach to Word Embedding</title>
    <author><first>SHOAIB</first><last>JAMEEL</last></author>
    <author><first>STEVEN</first><last>SCHOCKAERT</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>123&#8211;133</pages>
    <url>http://aclweb.org/anthology/K17-1014</url>
    <abstract>Vector representations of word meaning have found many applications in the
	field of natural language processing. Word vectors intuitively represent the
	average context in which a given word tends to occur, but they cannot
	explicitly model the diversity of these contexts. Although region
	representations of word meaning offer a natural alternative to word vectors,
	only few methods have been proposed that can effectively learn word regions. In
	this paper, we propose a new word embedding model which is based on SVM
	regression. We show that the underlying ranking interpretation of word contexts
	is sufficient to match, and sometimes outperform, the performance of popular
	methods such as Skip-gram. Furthermore, we show that by using a quadratic
	kernel, we can effectively learn word regions, which outperform existing
	unsupervised models for the task of hypernym detection.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jameel-schockaert:2017:CoNLL</bibkey>
  </paper>

  <paper id="1015">
    <title>An Artificial Language Evaluation of Distributional Semantic Models</title>
    <author><first>Fatemeh</first><last>Torabi Asr</last></author>
    <author><first>Michael</first><last>Jones</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>134&#8211;142</pages>
    <url>http://aclweb.org/anthology/K17-1015</url>
    <abstract>Recent studies of distributional semantic models have set up a competition
	between word embeddings obtained from predictive neural networks and word
	vectors obtained from abstractive count-based models. This paper is an attempt
	to reveal the underlying contribution of additional training data and
	post-processing steps on each type of model in word similarity and relatedness
	inference tasks. We do so by designing an artificial language framework,
	training a predictive and a count-based model on data sampled from this
	grammar, and evaluating the resulting word vectors in paradigmatic and
	syntagmatic tasks defined with respect to the grammar.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>torabiasr-jones:2017:CoNLL</bibkey>
  </paper>

  <paper id="1016">
    <title>Learning Word Representations with Regularization from Prior Knowledge</title>
    <author><first>Yan</first><last>Song</last></author>
    <author><first>Chia-Jung</first><last>Lee</last></author>
    <author><first>Fei</first><last>Xia</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>143&#8211;152</pages>
    <url>http://aclweb.org/anthology/K17-1016</url>
    <abstract>Conventional word embeddings are trained with specific criteria (e.g., based on
	language modeling or co-occurrence) inside a single information source,
	disregarding the opportunity for further calibration using external knowledge.
	This paper presents a unified framework that leverages pre-learned or external
	priors, in the form of a regularizer, for enhancing conventional language
	model-based embedding learning. We consider two types of regularizers. The
	first type is derived from topic distribution by running LDA on unlabeled data.
	The second type is based on dictionaries that are created with
	human annotation efforts. To effectively learn with the regularizers, we
	propose a novel data structure, trajectory softmax, in this paper. The
	resulting embeddings are evaluated by word similarity and sentiment
	classification. Experimental results show that our learning framework with
	regularization from prior knowledge improves embedding quality across multiple
	datasets, compared to a diverse collection of baseline methods.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>song-lee-xia:2017:CoNLL</bibkey>
  </paper>

  <paper id="1017">
    <title>Attention-based Recurrent Convolutional Neural Network for Automatic Essay Scoring</title>
    <author><first>Fei</first><last>Dong</last></author>
    <author><first>Yue</first><last>Zhang</last></author>
    <author><first>Jie</first><last>Yang</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>153&#8211;162</pages>
    <url>http://aclweb.org/anthology/K17-1017</url>
    <abstract>Neural network models have recently been applied to the task of automatic essay
	scoring, giving promising results. Existing work used recurrent neural networks
	and convolutional neural networks to model input essays, giving grades based on
	a single vector representation of the essay. On the other hand, the relative
	advantages of RNNs and CNNs have not been compared. In addition, different
	parts of the essay can contribute differently for scoring, which is not
	captured by existing models. We ad- dress these issues by building a
	hierarchical sentence-document model to represent essays, using the attention
	mechanism to automatically decide the relative weights of words and sentences.
	Results show that our model outperforms the previous state- of-the-art methods,
	demonstrating the effectiveness of the attention mechanism.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dong-zhang-yang:2017:CoNLL</bibkey>
  </paper>

  <paper id="1018">
    <title>Feature Selection as Causal Inference: Experiments with Text Classification</title>
    <author><first>Michael J.</first><last>Paul</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>163&#8211;172</pages>
    <url>http://aclweb.org/anthology/K17-1018</url>
    <abstract>This paper proposes a matching technique for learning causal associations
	between word features and class labels in document classification. The goal is
	to identify more meaningful and generalizable features than with only
	correlational approaches. Experiments with sentiment classification show that
	the proposed method identifies interpretable word associations with sentiment
	and improves classification performance in a majority of cases. The proposed
	feature selection method is particularly effective when applied to
	out-of-domain data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>paul:2017:CoNLL</bibkey>
  </paper>

  <paper id="1019">
    <title>A Joint Model for Semantic Sequences: Frames, Entities, Sentiments</title>
    <author><first>Haoruo</first><last>Peng</last></author>
    <author><first>Snigdha</first><last>Chaturvedi</last></author>
    <author><first>Dan</first><last>Roth</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>173&#8211;183</pages>
    <url>http://aclweb.org/anthology/K17-1019</url>
    <abstract>Understanding stories &#8211; sequences of events &#8211; is a crucial yet challenging
	natural language understanding task. These events typically carry multiple
	aspects of semantics including actions, entities and emotions. Not only does
	each individual aspect contribute to the meaning of the story, so does the
	interaction among these aspects.   
	Building on this intuition, we propose to jointly model important aspects of
	semantic knowledge &#8211; frames, entities and sentiments &#8211; via a semantic
	language model. We achieve this by first representing these aspects' semantic
	units at an appropriate level of abstraction and then using the resulting
	vector representations for each semantic aspect to learn a joint representation
	via a neural language model.
	We show that the joint semantic language model is of high quality and can
	generate better semantic sequences than models that operate on the word level.
	We further demonstrate that our joint model can be applied to story cloze test
	and shallow discourse parsing tasks with improved performance and that each
	semantic aspect contributes to the model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>peng-chaturvedi-roth:2017:CoNLL</bibkey>
  </paper>

  <paper id="1020">
    <title>Neural Sequence-to-sequence Learning of Internal Word Structure</title>
    <author><first>Tatyana</first><last>Ruzsics</last></author>
    <author><first>Tanja</first><last>Samardzic</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>184&#8211;194</pages>
    <url>http://aclweb.org/anthology/K17-1020</url>
    <abstract>Learning internal word structure has recently been recognized as an important
	step in various multilingual processing tasks and in theoretical language
	comparison. In this paper, we present a neural encoder-decoder model for
	learning canonical morphological segmentation. Our model combines
	character-level sequence-to-sequence transformation with a language model over
	canonical segments. We obtain up to 4% improvement over a strong
	character-level encoder-decoder baseline for three languages. Our model
	outperforms the previous state-of-the-art for two languages, while eliminating
	the need for external resources such as large dictionaries. Finally, by
	comparing the performance of encoder-decoder and classical statistical machine
	translation systems trained with and without corpus counts, we show that
	including corpus counts is beneficial to both approaches.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ruzsics-samardzic:2017:CoNLL</bibkey>
  </paper>

  <paper id="1021">
    <title>A Supervised Approach to Extractive Summarisation of Scientific Papers</title>
    <author><first>Ed</first><last>Collins</last></author>
    <author><first>Isabelle</first><last>Augenstein</last></author>
    <author><first>Sebastian</first><last>Riedel</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>195&#8211;205</pages>
    <url>http://aclweb.org/anthology/K17-1021</url>
    <abstract>Automatic summarisation is a popular approach to reduce a document to its main
	arguments. Recent research in the area has focused on neural approaches to
	summarisation, which can be very data-hungry. However, few large datasets exist
	and none for the traditionally popular domain of scientific publications, which
	opens up challenging research avenues centered on encoding large, complex
	documents. In this paper, we introduce a new dataset for summarisation of
	computer science publications by exploiting a large resource of author provided
	summaries and show straightforward ways of extending it further. We develop
	models on the dataset making use of both neural sentence encoding and
	traditionally used summarisation features and show that models which encode
	sentences as well as their local and global context perform best, significantly
	outperforming well-established baseline methods.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>collins-augenstein-riedel:2017:CoNLL</bibkey>
  </paper>

  <paper id="1022">
    <title>An Automatic Approach for Document-level Topic Model Evaluation</title>
    <author><first>Shraey</first><last>Bhatia</last></author>
    <author><first>Jey Han</first><last>Lau</last></author>
    <author><first>Timothy</first><last>Baldwin</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>206&#8211;215</pages>
    <url>http://aclweb.org/anthology/K17-1022</url>
    <abstract>Topic models jointly learn topics and document-level topic
	  distribution.  Extrinsic evaluation of topic models tends to focus
	  exclusively on topic-level evaluation, e.g. by assessing the
	  coherence of topics. We demonstrate that there can be large
	  discrepancies between topic- and document-level model quality, and
	  that basing model evaluation on topic-level analysis can be
	  highly misleading.  We propose a method for automatically predicting
	  topic model quality based on analysis of document-level topic
	  allocations, and provide empirical evidence for its robustness.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bhatia-lau-baldwin:2017:CoNLL</bibkey>
  </paper>

  <paper id="1023">
    <title>Robust Coreference Resolution and Entity Linking on Dialogues: Character Identification on TV Show Transcripts</title>
    <author><first>Henry Y.</first><last>Chen</last></author>
    <author><first>Ethan</first><last>Zhou</last></author>
    <author><first>Jinho D.</first><last>Choi</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>216&#8211;225</pages>
    <url>http://aclweb.org/anthology/K17-1023</url>
    <abstract>This paper presents a novel approach to character identification, that is an
	entity linking task that maps mentions to characters in dialogues from TV show
	transcripts. We first augment and correct several cases of annotation errors in
	an existing corpus so the corpus is clearer and cleaner for statistical
	learning. We also introduce the agglomerative convolutional neural network that
	takes groups of features and learns mention and mention-pair embeddings for
	coreference resolution. We then propose another neural model that employs the
	embeddings learned and creates cluster embeddings for entity linking. Our
	coreference resolution model shows comparable results to other state-of-the-art
	systems. Our entity linking model significantly outperforms the previous work,
	showing the F1 score of 86.76% and the accuracy of 95.30% for character
	identification.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chen-zhou-choi:2017:CoNLL</bibkey>
  </paper>

  <paper id="1024">
    <title>Cross-language Learning with Adversarial Neural Networks</title>
    <author><first>Shafiq</first><last>Joty</last></author>
    <author><first>Preslav</first><last>Nakov</last></author>
    <author><first>Llu&#237;s</first><last>M&#224;rquez</last></author>
    <author><first>Israa</first><last>Jaradat</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>226&#8211;237</pages>
    <url>http://aclweb.org/anthology/K17-1024</url>
    <abstract>We address the problem of cross-language adaptation for question-question
	similarity reranking in community question answering, with the objective to
	port a system trained on one input language to another input language given
	labeled training data for the first language and only unlabeled data for the
	second language.
	In particular, we propose to use adversarial training of neural networks to
	learn high-level features that are discriminative for the main learning task,
	and at the same time are invariant across the input languages. The evaluation
	results show sizable improvements for our cross-language adversarial neural
	network (CLANN) model over a strong non-adversarial system.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>joty-EtAl:2017:CoNLL</bibkey>
  </paper>

  <paper id="1025">
    <title>Knowledge Tracing in Sequential Learning of Inflected Vocabulary</title>
    <author><first>Adithya</first><last>Renduchintala</last></author>
    <author><first>Philipp</first><last>Koehn</last></author>
    <author><first>Jason</first><last>Eisner</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>238&#8211;247</pages>
    <url>http://aclweb.org/anthology/K17-1025</url>
    <abstract>We present a feature-rich knowledge tracing method that captures a student's
	acquisition and retention of knowledge during a foreign language phrase
	learning task. We model the student's behavior as making predictions under a
	log-linear model, and adopt a neural gating mechanism to model how the student
	updates their log-linear parameters in response to feedback.  The gating
	mechanism allows the model to learn complex patterns of retention and
	acquisition for each feature, while the log-linear parameterization results in
	an interpretable knowledge state. We collect human data and evaluate several
	versions of the model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>renduchintala-koehn-eisner:2017:CoNLL</bibkey>
  </paper>

  <paper id="1026">
    <title>A Probabilistic Generative Grammar for Semantic Parsing</title>
    <author><first>Abulhair</first><last>Saparov</last></author>
    <author><first>Vijay</first><last>Saraswat</last></author>
    <author><first>Tom</first><last>Mitchell</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>248&#8211;259</pages>
    <url>http://aclweb.org/anthology/K17-1026</url>
    <abstract>We present a generative model of natural language sentences and demonstrate its
	application to semantic parsing. In the generative process, a logical form
	sampled from a prior, and conditioned on this logical form, a grammar
	probabilistically generates the output sentence. Grammar induction using MCMC
	is applied to learn the grammar given a set of labeled sentences with
	corresponding logical forms. We develop a semantic parser that finds the
	logical form with the highest posterior probability exactly. We obtain strong
	results on the GeoQuery dataset and achieve state-of-the-art F1 on Jobs.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>saparov-saraswat-mitchell:2017:CoNLL</bibkey>
  </paper>

  <paper id="1027">
    <title>Learning Contextual Embeddings for Structural Semantic Similarity using Categorical Information</title>
    <author><first>Massimo</first><last>Nicosia</last></author>
    <author><first>Alessandro</first><last>Moschitti</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>260&#8211;270</pages>
    <url>http://aclweb.org/anthology/K17-1027</url>
    <abstract>Tree kernels (TKs) and neural networks are two effective approaches for
	automatic feature engineering. In this paper, we combine them by modeling
	context word similarity in semantic TKs. This way, the latter can operate
	subtree matching by applying neural-based similarity on tree lexical nodes. We
	study how to learn representations for the words in context such that TKs can
	exploit more focused information. We found that neural embeddings produced by
	current methods do not provide a suitable contextual similarity. Thus, we
	define a new approach based on a Siamese Network, which produces word
	representations while learning a binary text similarity. We set the latter
	considering examples in the same category as similar. The experiments on
	question and sentiment classification show that our semantic TK highly improves
	previous results.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>nicosia-moschitti:2017:CoNLL</bibkey>
  </paper>

  <paper id="1028">
    <title>Making Neural QA as Simple as Possible but not Simpler</title>
    <author><first>Dirk</first><last>Weissenborn</last></author>
    <author><first>Georg</first><last>Wiese</last></author>
    <author><first>Laura</first><last>Seiffe</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>271&#8211;280</pages>
    <url>http://aclweb.org/anthology/K17-1028</url>
    <abstract>Recent development of large-scale question answering (QA) datasets triggered a
	substantial amount of research into end-to-end neural architectures for QA.
	Increasingly complex systems have been conceived without comparison to simpler
	neural baseline systems that would justify their complexity. In this work, we
	propose a simple heuristic that guides the development of neural baseline
	systems for the extractive QA task. We find that there are two ingredients
	necessary for building a high-performing neural QA system: first, the awareness
	of question words while processing the context and second, a composition
	function that goes beyond simple bag-of-words modeling, such as recurrent
	neural networks. Our results show that FastQA, a system that meets these two
	requirements, can achieve very competitive performance compared with existing
	models. We argue that this surprising finding puts results of previous systems
	and the complexity of recent QA datasets into perspective.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>weissenborn-wiese-seiffe:2017:CoNLL</bibkey>
  </paper>

  <paper id="1029">
    <title>Neural Domain Adaptation for Biomedical Question Answering</title>
    <author><first>Georg</first><last>Wiese</last></author>
    <author><first>Dirk</first><last>Weissenborn</last></author>
    <author><first>Mariana</first><last>Neves</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>281&#8211;289</pages>
    <url>http://aclweb.org/anthology/K17-1029</url>
    <abstract>Factoid question answering (QA) has recently benefited from the development of
	deep learning (DL) systems. Neural network models outperform traditional
	approaches in domains where large datasets exist, such as SQuAD (ca. 100,000
	questions) for Wikipedia articles. However, these systems have not yet been
	applied to QA in more specific domains, such as biomedicine, because datasets
	are generally too small to train a DL system from scratch. For example, the
	BioASQ dataset for biomedical QA comprises less then 900 factoid (single
	answer) and list (multiple answers) QA instances. In this work, we adapt a
	neural QA system trained on a large open-domain dataset (SQuAD, source) to a
	biomedical dataset (BioASQ, target) by employing various transfer learning
	techniques. Our network architecture is based on a state-of-the-art QA system,
	extended with biomedical word embeddings and a novel mechanism to answer list
	questions. In contrast to existing biomedical QA systems, our system does not
	rely on domain-specific ontologies, parsers or entity taggers, which are
	expensive to create. Despite this fact, our systems achieve state-of-the-art
	results on factoid questions and competitive results on list questions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wiese-weissenborn-neves:2017:CoNLL</bibkey>
  </paper>

  <paper id="1030">
    <title>A phoneme clustering algorithm based on the obligatory contour principle</title>
    <author><first>Mans</first><last>Hulden</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>290&#8211;300</pages>
    <url>http://aclweb.org/anthology/K17-1030</url>
    <abstract>This paper explores a divisive hierarchical clustering algorithm based on the
	well-known Obligatory Contour Principle in phonology.  The purpose is twofold:
	to see if such an algorithm could be used for unsupervised classification of
	phonemes or graphemes in corpora, and to investigate whether this purported
	universal constraint really holds for several classes of phonological
	distinctive features.  The algorithm achieves very high accuracies in an
	unsupervised setting of inferring a consonant-vowel distinction, and also has a
	strong tendency to detect coronal phonemes in an unsupervised fashion.
	Remaining classes, however, do not correspond as neatly to phonological
	distinctive feature splits.  While the results offer only mixed support for a
	universal Obligatory Contour Principle, the algorithm can be very useful for
	many NLP tasks due to the high accuracy in revealing consonant/vowel/coronal
	distinctions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hulden:2017:CoNLL</bibkey>
  </paper>

  <paper id="1031">
    <title>Learning Stock Market Sentiment Lexicon and Sentiment-Oriented Word Vector from StockTwits</title>
    <author><first>Quanzhi</first><last>Li</last></author>
    <author><first>Sameena</first><last>Shah</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>301&#8211;310</pages>
    <url>http://aclweb.org/anthology/K17-1031</url>
    <abstract>Previous studies have shown that investor sentiment indicators can predict
	stock market change.  A domain-specific sentiment lexicon and
	sentiment-oriented word embedding model would help the sentiment analysis in
	financial domain and stock market. In this paper, we present a new approach to
	learning stock market lexicon from StockTwits, a popular financial social
	network for investors to share ideas.  It learns word polarity by predicting
	message sentiment, using a neural net-work.  The sentiment-oriented word
	embeddings are learned from tens of millions of StockTwits posts, and this is
	the first study presenting sentiment-oriented word embeddings for stock market.
	 The experiments of predicting investor sentiment show that our lexicon
	outperformed other lexicons built by the state-of-the-art methods, and the
	sentiment-oriented word vector was much better than the general word
	embeddings.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>li-shah:2017:CoNLL</bibkey>
  </paper>

  <paper id="1032">
    <title>Learning local and global contexts using a convolutional recurrent network model for relation classification in biomedical text</title>
    <author><first>Desh</first><last>Raj</last></author>
    <author><first>SUNIL</first><last>SAHU</last></author>
    <author><first>Ashish</first><last>Anand</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>311&#8211;321</pages>
    <url>http://aclweb.org/anthology/K17-1032</url>
    <abstract>The task of relation classification in the biomedical domain is complex due to
	the presence of samples obtained from heterogeneous sources such as research
	articles, discharge summaries, or electronic health records. It is also a
	constraint for classifiers which employ manual feature engineering. In this
	paper, we propose a convolutional recurrent neural network (CRNN) architecture
	that combines RNNs and CNNs in sequence to solve this problem. The rationale
	behind our approach is that CNNs can effectively identify coarse-grained local
	features in a sentence, while RNNs are more suited for long-term dependencies.
	We compare our CRNN model with several baselines on two biomedical datasets,
	namely the i2b2-2010 clinical relation extraction challenge dataset, and the
	SemEval-2013 DDI extraction dataset. We also evaluate an attentive pooling
	technique and report its performance in comparison with the conventional max
	pooling method. Our results indicate that the proposed model achieves
	state-of-the-art performance on both datasets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>raj-sahu-anand:2017:CoNLL</bibkey>
  </paper>

  <paper id="1033">
    <title>Idea density for predicting Alzheimer's disease from transcribed speech</title>
    <author><first>Kairit</first><last>Sirts</last></author>
    <author><first>Olivier</first><last>Piguet</last></author>
    <author><first>Mark</first><last>Johnson</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>322&#8211;332</pages>
    <url>http://aclweb.org/anthology/K17-1033</url>
    <abstract>Idea Density (ID) measures the rate at which ideas or elementary predications
	are expressed in an utterance or in a text.
	Lower ID is found to be associated with an increased risk of developing
	Alzheimer's disease (AD) (Snowdon et al., 1996; Engelman et al., 2010).
	ID has been used in two different versions: propositional idea density (PID)
	counts the expressed ideas and can be applied to any text while semantic idea
	density (SID) counts pre-defined information content units and is naturally
	more applicable to normative domains, such as picture description tasks.
	In this paper, we develop DEPID, a novel dependency-based method for computing
	PID, and its version DEPID-R that enables to exclude repeating ideas&#8211;-a
	feature characteristic to AD speech.  We conduct the first comparison of
	automatically extracted PID and SID in the diagnostic classification task on
	two different AD datasets covering both closed-topic and free-recall domains. 
	While SID performs better on the normative dataset, adding PID leads to a small
	but significant improvement (+1.7 F-score). On the free-topic dataset, PID
	performs better than SID as expected (77.6 vs 72.3 in F-score) but adding the
	features derived from the word embedding clustering underlying the automatic
	SID increases the results considerably, leading to an F-score of 84.8.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sirts-piguet-johnson:2017:CoNLL</bibkey>
  </paper>

  <paper id="1034">
    <title>Zero-Shot Relation Extraction via Reading Comprehension</title>
    <author><first>Omer</first><last>Levy</last></author>
    <author><first>Minjoon</first><last>Seo</last></author>
    <author><first>Eunsol</first><last>Choi</last></author>
    <author><first>Luke</first><last>Zettlemoyer</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>333&#8211;342</pages>
    <url>http://aclweb.org/anthology/K17-1034</url>
    <abstract>We show that relation extraction can be reduced to answering simple reading
	comprehension questions, by associating one or more natural-language questions
	with each relation slot. This reduction has several advantages: we can (1)
	learn relation-extraction models by extending recent neural
	reading-comprehension techniques, (2) build very large training sets for those
	models by combining relation-specific crowd-sourced questions with distant
	supervision, and even (3) do zero-shot learning by extracting new relation
	types that are only specified at test-time, for which we have no labeled
	training examples. Experiments on a Wikipedia slot-filling task demonstrate
	that the approach can generalize to new questions for known relation types with
	high accuracy, and that zero-shot generalization to unseen relation types is
	possible, at lower accuracy levels, setting the bar for future work on this
	task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>levy-EtAl:2017:CoNLL</bibkey>
  </paper>

  <paper id="1035">
    <title>The Covert Helps Parse the Overt</title>
    <author><first>Xun</first><last>Zhang</last></author>
    <author><first>Weiwei</first><last>Sun</last></author>
    <author><first>Xiaojun</first><last>Wan</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>343&#8211;353</pages>
    <url>http://aclweb.org/anthology/K17-1035</url>
    <abstract>This paper is concerned with whether
	deep syntactic information can help surface parsing, with a particular focus on
	empty categories. We design new algorithms to produce dependency trees in
	which empty elements are allowed, and
	evaluate the impact of information about
	empty category on parsing overt elements.
	Such information is helpful to reduce the
	approximation error in a structured parsing model, but increases the search
	space
	for inference and accordingly the estimation error. To deal with
	structure-based
	overfitting, we propose to integrate disambiguation models with and without
	empty
	elements, and perform structure regularization via joint decoding. Experiments
	on
	English and Chinese TreeBanks with different parsing models indicate that
	incorporating empty elements consistently improves surface parsing.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhang-sun-wan:2017:CoNLL</bibkey>
  </paper>

  <paper id="1036">
    <title>German in Flux: Detecting Metaphoric Change via Word Entropy</title>
    <author><first>Dominik</first><last>Schlechtweg</last></author>
    <author><first>Stefanie</first><last>Eckmann</last></author>
    <author><first>Enrico</first><last>Santus</last></author>
    <author><first>Sabine</first><last>Schulte im Walde</last></author>
    <author><first>Daniel</first><last>Hole</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>354&#8211;367</pages>
    <url>http://aclweb.org/anthology/K17-1036</url>
    <abstract>This paper explores the information-theoretic measure entropy to detect
	metaphoric change, transferring ideas from hypernym detection to research on
	language change. We build the first diachronic test set for German as a
	standard for metaphoric change annotation. Our model is unsupervised,
	language-independent and generalizable to other processes of semantic change.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>schlechtweg-EtAl:2017:CoNLL</bibkey>
  </paper>

  <paper id="1037">
    <title>Encoding of phonology in a recurrent neural model of grounded speech</title>
    <author><first>Afra</first><last>Alishahi</last></author>
    <author><first>Marie</first><last>Barking</last></author>
    <author><first>Grzegorz</first><last>Chrupa&#x142;a</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>368&#8211;378</pages>
    <url>http://aclweb.org/anthology/K17-1037</url>
    <abstract>We study the representation and encoding of phonemes in a recurrent
	  neural network model of grounded speech. We use a model which
	  processes images and their spoken descriptions, and projects the
	  visual and auditory representations into the same semantic space. We
	  perform a number of analyses on how information about individual
	  phonemes is encoded in the MFCC features extracted from the speech
	  signal, and the activations of the layers of the model. Via
	  experiments with phoneme decoding and phoneme discrimination we show
	  that phoneme representations are most salient in the lower layers of
	  the model, where low-level signals are processed at a fine-grained
	  level, although a large amount of phonological information is retain at
	  the top recurrent layer. We further find out that the
	  attention mechanism following the top recurrent layer significantly
	  attenuates encoding of phonology and makes the utterance embeddings
	  much more invariant to synonymy. Moreover, a hierarchical clustering
	  of phoneme representations learned by the network shows an
	  organizational structure of phonemes similar to those proposed in
	  linguistics.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>alishahi-barking-chrupala:2017:CoNLL</bibkey>
  </paper>

  <paper id="1038">
    <title>Multilingual Semantic Parsing And Code-Switching</title>
    <author><first>Long</first><last>Duong</last></author>
    <author><first>Hadi</first><last>Afshar</last></author>
    <author><first>Dominique</first><last>Estival</last></author>
    <author><first>Glen</first><last>Pink</last></author>
    <author><first>Philip</first><last>Cohen</last></author>
    <author><first>Mark</first><last>Johnson</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>379&#8211;389</pages>
    <url>http://aclweb.org/anthology/K17-1038</url>
    <abstract>Extending semantic parsing systems to new domains and languages is a highly
	expensive, time-consuming process, so making effective use of existing
	resources is critical. In this paper, we describe a transfer learning method
	using crosslingual word embeddings in a sequence-to-sequence model.  On the
	NLmaps corpus, our approach achieves state-of-the-art accuracy of 85.7% for
	English.  Most importantly, we observed a consistent improvement for German
	compared with several baseline domain adaptation techniques.  As a by-product
	of this approach, our models that are trained on a combination of English and
	German utterances perform reasonably well on code-switching utterances which
	contain a mixture of English and German, even though the training data does not
	contain any such. As far as we know, this is the first study of code-switching
	in semantic parsing. We manually constructed the set of code-switching test 
	utterances for the NLmaps corpus and achieve 78.3% accuracy on                       
	this
	dataset.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>duong-EtAl:2017:CoNLL</bibkey>
  </paper>

  <paper id="1039">
    <title>Optimizing Differentiable Relaxations of Coreference Evaluation Metrics</title>
    <author><first>Phong</first><last>Le</last></author>
    <author><first>Ivan</first><last>Titov</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>390&#8211;399</pages>
    <url>http://aclweb.org/anthology/K17-1039</url>
    <abstract>Coreference evaluation metrics are hard to optimize directly as they are
	non-differentiable functions, not easily decomposable into elementary
	decisions. Consequently, most approaches optimize objectives only indirectly
	related to the end goal, resulting in suboptimal performance. Instead, we
	propose a differentiable relaxation that lends itself to gradient-based
	optimisation, thus bypassing the need for reinforcement learning or heuristic
	modification of cross-entropy. We show that by modifying the training objective
	of a competitive neural coreference system, we obtain a substantial gain in
	performance. This suggests that our approach can be regarded as a viable
	alternative to using reinforcement learning or more computationally expensive
	imitation learning.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>le-titov:2017:CoNLL</bibkey>
  </paper>

  <paper id="1040">
    <title>Neural Structural Correspondence Learning for Domain Adaptation</title>
    <author><first>Yftah</first><last>Ziser</last></author>
    <author><first>Roi</first><last>Reichart</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>400&#8211;410</pages>
    <url>http://aclweb.org/anthology/K17-1040</url>
    <abstract>We introduce a neural network model that marries together ideas from two
	prominent
	strands of research on domain adaptation through representation learning:
	structural correspondence learning (SCL, (Blitzer et al., 2006)) and
	autoencoder neural networks (NNs). Our model is a three-layer NN that learns to
	encode the non-pivot features of an input example into a low dimensional
	representation, so that the existence of pivot features (features that are
	prominent in both domains and convey useful information for the NLP task) in
	the example can be decoded from that representation. The low-dimensional
	representation is then employed in a learning algorithm for the task. Moreover,
	we show how to inject pre-trained word embeddings into our model in order to
	improve
	generalization across examples with similar pivot features. We experiment with
	the
	task of cross-domain sentiment classification on 16 domain pairs and show
	substantial improvements over strong baselines.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ziser-reichart:2017:CoNLL</bibkey>
  </paper>

  <paper id="1041">
    <title>A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling</title>
    <author><first>Diego</first><last>Marcheggiani</last></author>
    <author><first>Anton</first><last>Frolov</last></author>
    <author><first>Ivan</first><last>Titov</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>411&#8211;420</pages>
    <url>http://aclweb.org/anthology/K17-1041</url>
    <abstract>We introduce a simple and accurate neural model for dependency-based semantic
	role labeling.
	Our model predicts predicate-argument dependencies relying on states of a
	bidirectional LSTM encoder. 
	The semantic role labeler achieves competitive performance on English, even
	without any kind of syntactic information and only using local inference. 
	However, when automatically predicted part-of-speech tags are provided as
	input, it substantially outperforms all previous local models and approaches
	the best reported results on the English CoNLL-2009 dataset. 
	We also consider Chinese, Czech and Spanish where our approach also achieves
	competitive results.
	Syntactic parsers are unreliable on out-of-domain data, so standard (i.e.,
	syntactically-informed) SRL models are hindered when tested in this setting. 
	Our syntax-agnostic model appears more robust, resulting in the best reported
	results on standard out-of-domain test sets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>marcheggiani-frolov-titov:2017:CoNLL</bibkey>
  </paper>

  <paper id="1042">
    <title>Joint Prediction of Morphosyntactic Categories for Fine-Grained Arabic Part-of-Speech Tagging Exploiting Tag Dictionary Information</title>
    <author><first>Go</first><last>Inoue</last></author>
    <author><first>Hiroyuki</first><last>Shindo</last></author>
    <author><first>Yuji</first><last>Matsumoto</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>421&#8211;431</pages>
    <url>http://aclweb.org/anthology/K17-1042</url>
    <abstract>Part-of-speech (POS) tagging for morphologically rich languages such as Arabic
	is a challenging problem because of their enormous tag sets. One reason for
	this is that in the tagging scheme for such languages, a complete POS tag is
	formed by combining tags from multiple tag sets defined for each
	morphosyntactic category. Previous approaches in Arabic POS tagging applied one
	model for each morphosyntactic tagging task, without utilizing shared
	information between the tasks. In this paper, we propose an approach that
	utilizes this information by jointly modeling multiple morphosyntactic tagging
	tasks with a multi-task learning framework. We also propose a method of
	incorporating tag dictionary information into our neural models by combining
	word representations with representations of the sets of possible tags. Our
	experiments showed that the joint model with tag dictionary information results
	in an accuracy of 91.38% on the Penn Arabic Treebank data set, with an absolute
	improvement of 2.11% over the current state-of-the-art tagger.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>inoue-shindo-matsumoto:2017:CoNLL</bibkey>
  </paper>

  <paper id="1043">
    <title>Learning from Relatives: Unified Dialectal Arabic Segmentation</title>
    <author><first>Younes</first><last>Samih</last></author>
    <author><first>Mohamed</first><last>Eldesouki</last></author>
    <author><first>Mohammed</first><last>Attia</last></author>
    <author><first>Kareem</first><last>Darwish</last></author>
    <author><first>Ahmed</first><last>Abdelali</last></author>
    <author><first>Hamdy</first><last>Mubarak</last></author>
    <author><first>Laura</first><last>Kallmeyer</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>432&#8211;441</pages>
    <url>http://aclweb.org/anthology/K17-1043</url>
    <abstract>Arabic dialects do not just share a common koin&#233;, but there are shared
	pan-dialectal linguistic phenomena that allow computational models for dialects
	to learn from each other. In this paper we build a unified segmentation model
	where the training data for different dialects are combined and a single model
	is trained. The model yields higher accuracies than dialect-specific models,
	eliminating the need for dialect identification before segmentation. We also
	measure the degree of relatedness between four major Arabic dialects by testing
	how a segmentation model trained on one dialect performs on the other dialects.
	We found that linguistic relatedness is contingent with geographical proximity.
	In our experiments we use SVM-based ranking and bi-LSTM-CRF sequence labeling.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>samih-EtAl:2017:CoNLL</bibkey>
  </paper>

  <paper id="1044">
    <title>Natural Language Generation for Spoken Dialogue System using RNN Encoder-Decoder Networks</title>
    <author><first>Van-Khanh</first><last>Tran</last></author>
    <author><first>Le-Minh</first><last>Nguyen</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>442&#8211;451</pages>
    <url>http://aclweb.org/anthology/K17-1044</url>
    <abstract>Natural language generation (NLG) is a critical component in a spoken dialogue
	system. 
	This paper presents a Recurrent Neural Network based Encoder-Decoder
	architecture, in which an LSTM-based decoder is introduced to select, aggregate
	semantic elements produced by an attention mechanism over the input elements,
	and to produce the required utterances.
	The proposed generator can be jointly trained both sentence planning and
	surface realization to produce natural language sentences.
	The proposed model was extensively evaluated on four different NLG datasets.
	The experimental results showed that the proposed generators not only
	consistently outperform the previous methods across all the NLG domains but
	also show an ability to generalize from a new, unseen domain and learn from
	multi-domain datasets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tran-nguyen:2017:CoNLL</bibkey>
  </paper>

  <paper id="1045">
    <title>Graph-based Neural Multi-Document Summarization</title>
    <author><first>Michihiro</first><last>Yasunaga</last></author>
    <author><first>Rui</first><last>Zhang</last></author>
    <author><first>Kshitijh</first><last>Meelu</last></author>
    <author><first>Ayush</first><last>Pareek</last></author>
    <author><first>Krishnan</first><last>Srinivasan</last></author>
    <author><first>Dragomir</first><last>Radev</last></author>
    <booktitle>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>452&#8211;462</pages>
    <url>http://aclweb.org/anthology/K17-1045</url>
    <abstract>We propose a neural multi-document summarization system that incorporates
	sentence relation graphs.
	We employ a Graph Convolutional Network (GCN) on the relation graphs, with
	sentence embeddings obtained from Recurrent Neural Networks as input node
	features.
	Through multiple layer-wise propagation, the GCN generates high-level hidden
	sentence features for salience estimation.
	We then use a greedy heuristic to extract salient sentences that avoid
	redundancy.
	In our experiments on DUC 2004, we consider three types of sentence relation
	graphs and demonstrate the advantage of combining sentence relations in graphs
	with the representation power of deep neural networks.
	Our model improves upon other traditional graph-based extractive approaches and
	the vanilla GRU sequence model with no graph, and it achieves competitive
	results against other state-of-the-art multi-document summarization systems.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yasunaga-EtAl:2017:CoNLL</bibkey>
  </paper>

</volume>

