<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W17">
  <paper id="2600">
    <title>Proceedings of the 2nd Workshop on Representation Learning for NLP</title>
    <editor>Phil Blunsom</editor>
    <editor>Antoine Bordes</editor>
    <editor>Kyunghyun Cho</editor>
    <editor>Shay Cohen</editor>
    <editor>Chris Dyer</editor>
    <editor>Edward Grefenstette</editor>
    <editor>Karl Moritz Hermann</editor>
    <editor>Laura Rimell</editor>
    <editor>Jason Weston</editor>
    <editor>Scott Yih</editor>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/W17-26</url>
    <bibtype>book</bibtype>
    <bibkey>RepL4NLP:2017</bibkey>
  </paper>

  <paper id="2601">
    <title>Sense Contextualization in a Dependency-Based Compositional Distributional Model</title>
    <author><first>Pablo</first><last>Gamallo</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;9</pages>
    <url>http://www.aclweb.org/anthology/W17-2601</url>
    <abstract>Little attention has been paid to distributional compositional methods which
	employ syntactically structured vector models. As word vectors belonging to
	different syntactic categories                                      have incompatible
	syntactic
	distributions, no
	trivial compositional operation can be applied to combine them into a new
	compositional vector. 
	In this article, we generalize the method described by                               
	Erk
	and
	Pad&#243;
	(2009) by
	proposing a dependency-base framework that contextualize not only lemmas but
	also selectional preferences.  The main contribution of the article is to
	expand their model to a fully compositional framework in which syntactic
	dependencies are put at the core of semantic composition.
	 We claim that semantic composition is mainly driven by syntactic dependencies.
	Each syntactic dependency generates two new compositional vectors representing
	the contextualized sense of the two related lemmas.  The sequential 
	application of the compositional operations associated to the dependencies
	results in as many contextualized vectors as lemmas the composite expression
	contains. At the end of the semantic process, we do not obtain a single
	compositional vector representing the semantic denotation of the whole
	composite expression, but one contextualized vector for each lemma of the whole
	expression. Our method avoids the troublesome high-order tensor representations
	by defining lemmas and selectional restrictions as first-order tensors (i.e.
	standard vectors). 
	A corpus-based experiment is performed to both evaluate the quality of the
	compositional vectors built with our strategy, and to compare them to other
	approaches on distributional compositional semantics. The experiments show that
	our dependency-based compositional method performs as  (or even better than)
	the state-of-the-art.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>gamallo:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2602">
    <title>Context encoders as a simple but powerful extension of word2vec</title>
    <author><first>Franziska</first><last>Horn</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>10&#8211;14</pages>
    <url>http://www.aclweb.org/anthology/W17-2602</url>
    <abstract>With a strikingly simple architecture and the ability to learn meaningful word
	embeddings efficiently from texts containing billions of words, word2vec
	remains one of the most popular neural language models used today. However, as
	only a single embedding is learned for every word in the vocabulary, the model
	fails to optimally represent words with multiple meanings and, additionally, it
	is not possible to create embeddings for new (out-of-vocabulary) words on the
	spot. Based on an intuitive interpretation of the continuous bag-of-words
	(CBOW) word2vec model's negative sampling training objective in terms of
	predicting context based similarities, we motivate an extension of the model we
	call context encoders (ConEc). By multiplying the matrix of trained word2vec
	embeddings with a word's average context vector, out-of-vocabulary (OOV)
	embeddings and representations for words with multiple meanings can be created
	based on the words' local contexts. The benefits of this approach are
	illustrated by using these word embeddings as features in the CoNLL 2003 named
	entity recognition (NER) task.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>horn:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2603">
    <title>Machine Comprehension by Text-to-Text Neural Question Generation</title>
    <author><first>Xingdi</first><last>Yuan</last></author>
    <author><first>Tong</first><last>Wang</last></author>
    <author><first>Caglar</first><last>Gulcehre</last></author>
    <author><first>Alessandro</first><last>Sordoni</last></author>
    <author><first>Philip</first><last>Bachman</last></author>
    <author><first>Saizheng</first><last>Zhang</last></author>
    <author><first>Sandeep</first><last>Subramanian</last></author>
    <author><first>Adam</first><last>Trischler</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>15&#8211;25</pages>
    <url>http://www.aclweb.org/anthology/W17-2603</url>
    <abstract>We propose a recurrent neural model that generates natural-language questions
	from documents, conditioned on answers. We show how to train the model using a
	combination of supervised and reinforcement learning. After teacher forcing for
	standard maximum likelihood training, we fine-tune the model using policy
	gradient techniques to maximize several rewards that measure question quality.
	Most notably, one of these rewards is the performance of a question-answering
	system. We motivate question generation as a means to improve the performance
	of question answering systems. Our model is trained and evaluated on the
	recent question-answering dataset SQuAD.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>yuan-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2604">
    <title>Emergent Predication Structure in Hidden State Vectors of Neural Readers</title>
    <author><first>Hai</first><last>Wang</last></author>
    <author><first>Takeshi</first><last>Onishi</last></author>
    <author><first>Kevin</first><last>Gimpel</last></author>
    <author><first>David</first><last>McAllester</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>26&#8211;36</pages>
    <url>http://www.aclweb.org/anthology/W17-2604</url>
    <abstract>A significant number of neural architectures for reading comprehension have
	recently been developed and evaluated on large cloze-style datasets.
	  We present experiments supporting the emergence of "predication structure" in
	the hidden state vectors of these readers.  More specifically, we provide
	evidence that the hidden state vectors represent atomic formulas Phi[c]
	where Phi is a semantic property (predicate) and c is a constant symbol
	entity identifier.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wang-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2605">
    <title>Towards Harnessing Memory Networks for Coreference Resolution</title>
    <author><first>Joe</first><last>Cheri</last></author>
    <author><first>Pushpak</first><last>Bhattacharyya</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>37&#8211;42</pages>
    <url>http://www.aclweb.org/anthology/W17-2605</url>
    <abstract>Coreference resolution task demands comprehending a discourse, especially for
	anaphoric mentions which require semantic information for resolving
	antecedents. We investigate into how memory networks can be helpful for
	coreference resolution when posed as question answering problem. The
	comprehension capability of memory networks assists coreference resolution,
	particularly for the mentions that require semantic and context information. We
	experiment memory networks for coreference resolution, with 4 synthetic
	datasets generated for coreference resolu- tion with varying difficulty levels.
	Our system’s performance is compared with a traditional coreference
	resolution system to show why memory network can be promising for coreference
	resolution.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>cheri-bhattacharyya:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2606">
    <title>Combining Word-Level and Character-Level Representations for Relation Classification of Informal Text</title>
    <author><first>Dongyun</first><last>Liang</last></author>
    <author><first>Weiran</first><last>Xu</last></author>
    <author><first>Yinge</first><last>Zhao</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>43&#8211;47</pages>
    <url>http://www.aclweb.org/anthology/W17-2606</url>
    <abstract>Word representation models have achieved great success in natural language
	processing tasks, such as relation classification. However, it does not always
	work on informal text, and the morphemes of some misspelling  words may carry
	important short-distance semantic information.
	We propose a hybrid model, combining the merits of word-level and
	character-level representations to learn better representations on informal
	text. 
	Experiments on two dataset of relation classification, SemEval-2010 Task8 and a
	large-scale one we compile from informal text, show that our model achieves a
	competitive result in the former and state-of-the-art with the other.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>liang-xu-zhao:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2607">
    <title>Transfer Learning for Neural Semantic Parsing</title>
    <author><first>Xing</first><last>Fan</last></author>
    <author><first>Emilio</first><last>Monti</last></author>
    <author><first>Lambert</first><last>Mathias</last></author>
    <author><first>Markus</first><last>Dreyer</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>48&#8211;56</pages>
    <url>http://www.aclweb.org/anthology/W17-2607</url>
    <abstract>The goal of semantic parsing is to map natural language to a machine
	interpretable meaning representation language (MRL). One of the constraints
	that limits full exploration of deep learning technologies for semantic parsing
	is the lack of sufficient annotation training data. In this paper, we propose
	using sequence-to-sequence in a multi-task setup for semantic parsing with
	focus on transfer learning. We explore three multi-task architectures for
	sequence-to-sequence model and compare their performance with the independently
	trained model. Our experiments show that the multi-task setup aids transfer
	learning from an auxiliary task with large labeled data to the target task with
	smaller labeled data. We see an absolute accuracy gain ranging from 1.0% to
	4.4% in in our in-house data set and we also see good gains ranging from 2.5%
	to 7.0% on the ATIS semantic parsing tasks with syntactic and semantic
	auxiliary tasks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>fan-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2608">
    <title>Modeling Large-Scale Structured Relationships with Shared Memory for Knowledge Base Completion</title>
    <author><first>Yelong</first><last>Shen</last></author>
    <author><first>Po-Sen</first><last>Huang</last></author>
    <author><first>Ming-Wei</first><last>Chang</last></author>
    <author><first>Jianfeng</first><last>Gao</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>57&#8211;68</pages>
    <url>http://www.aclweb.org/anthology/W17-2608</url>
    <abstract>Recent studies on knowledge base completion, the task of recovering missing
	relationships based on recorded relations, demonstrate the importance of
	learning embeddings from multi-step relations. However, due to the size of
	knowledge bases, learning multi-step relations directly on top of observed
	triplets could be costly. Hence, a manually designed procedure is often used
	when training the models. In this paper, we propose Implicit ReasoNets (IRNs),
	which is designed to perform multi-step inference implicitly through a
	controller and shared memory. Without a human-designed inference procedure,
	IRNs use training data to learn to perform multi-step inference in an embedding
	neural space through the shared memory and controller. While the inference
	procedure does not explicitly operate on top of observed triplets, our proposed
	model outperforms all previous approaches on the popular FB15k benchmark by
	more than 5.7%.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shen-EtAl:2017:RepL4NLP1</bibkey>
  </paper>

  <paper id="2609">
    <title>Knowledge Base Completion: Baselines Strike Back</title>
    <author><first>Rudolf</first><last>Kadlec</last></author>
    <author><first>Ondrej</first><last>Bajgar</last></author>
    <author><first>Jan</first><last>Kleindienst</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>69&#8211;74</pages>
    <url>http://www.aclweb.org/anthology/W17-2609</url>
    <abstract>Many papers have been published on the knowledge base completion task in the
	past few years. Most of these introduce novel architectures for relation
	learning that are evaluated on standard datasets like FB15k and WN18. This
	paper shows that the accuracy of almost all models published on the FB15k can
	be outperformed by an appropriately tuned baseline &#8211;- our reimplementation of
	the DistMult model. 
	Our findings cast doubt on the claim that the performance improvements of
	recent models are due to architectural changes as opposed to hyper-parameter
	tuning or different training objectives.
	This should prompt future research to re-consider how the performance of models
	is evaluated and reported.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kadlec-bajgar-kleindienst:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2610">
    <title>Sequential Attention: A Context-Aware Alignment Function for Machine Reading</title>
    <author><first>Sebastian</first><last>Brarda</last></author>
    <author><first>Philip</first><last>Yeres</last></author>
    <author><first>Samuel</first><last>Bowman</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>75&#8211;80</pages>
    <url>http://www.aclweb.org/anthology/W17-2610</url>
    <abstract>In this paper we  propose a neural network model with a novel Sequential
	Attention layer that extends soft attention by assigning weights to words in an
	input sequence in a way that takes into account not just how well that word
	matches a query, but how well surrounding words match. We evaluate this
	approach on the task of reading comprehension (on the Who did What and CNN
	datasets) and show that it dramatically improves a strong baseline&#8211;-the
	Stanford Reader&#8211;-and is competitive with the state of the art.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>brarda-yeres-bowman:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2611">
    <title>Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines</title>
    <author><first>Jan</first><last>Rygl</last></author>
    <author><first>Jan</first><last>Pomik&#225;lek</last></author>
    <author><first>Radim</first><last>Řehů&#x159;ek</last></author>
    <author><first>Michal</first><last>Rů&#x17E;i&#x10D;ka</last></author>
    <author><first>V&#237;t</first><last>Novotn&#253;</last></author>
    <author><first>Petr</first><last>Sojka</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>81&#8211;90</pages>
    <url>http://www.aclweb.org/anthology/W17-2611</url>
    <abstract>Vector representations and vector space modeling (VSM) play a central role in
	modern machine learning. We propose a novel approach to ‘vector similarity
	searching’ over dense semantic representations of words and documents that
	can be deployed on top of traditional inverted-index-based fulltext engines,
	taking advantage of their robustness, stability, scalability and ubiquity.
	We show that this approach allows the indexing and querying of dense vectors in
	text domains. This opens up exciting avenues for major efficiency gains, along
	with simpler deployment, scaling and monitoring.
	The end result is a fast and scalable vector database with a tunable trade-off
	between vector search performance and quality, backed by a standard fulltext
	engine such as Elasticsearch.
	We empirically demonstrate its querying performance and quality by applying
	this solution to the task of semantic searching over a dense vector
	representation of the entire English Wikipedia.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rygl-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2612">
    <title>Multi-task Domain Adaptation for Sequence Tagging</title>
    <author><first>Nanyun</first><last>Peng</last></author>
    <author><first>Mark</first><last>Dredze</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>91&#8211;100</pages>
    <url>http://www.aclweb.org/anthology/W17-2612</url>
    <abstract>Many domain adaptation approaches rely
	on learning cross domain shared representations
	to transfer the knowledge learned
	in one domain to other domains. Traditional
	domain adaptation only considers
	adapting for one task. In this paper, we
	explore multi-task representation learning
	under the domain adaptation scenario. We
	propose a neural network framework that
	supports domain adaptation for multiple
	tasks simultaneously, and learns shared
	representations that better generalize for
	domain adaptation. We apply the proposed
	framework to domain adaptation
	for sequence tagging problems considering
	two tasks: Chinese word segmentation
	and named entity recognition. Experiments
	show that multi-task domain adaptation
	works better than disjoint domain
	adaptation for each task, and achieves the
	state-of-the-art results for both tasks in the
	social media domain.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>peng-dredze:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2613">
    <title>Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context</title>
    <author><first>Shyam</first><last>Upadhyay</last></author>
    <author><first>Kai-Wei</first><last>Chang</last></author>
    <author><first>Matt</first><last>Taddy</last></author>
    <author><first>Adam</first><last>Kalai</last></author>
    <author><first>James</first><last>Zou</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>101&#8211;110</pages>
    <url>http://www.aclweb.org/anthology/W17-2613</url>
    <abstract>Word embeddings, which represent a word as a point in a vector space, have
	become ubiquitous to several NLP tasks.
	A recent line of work uses bilingual (two languages) corpora to learn a
	different vector
	 for each sense of a word, by exploiting crosslingual signals to aid sense
	identification.
	We present a multi-view Bayesian non-parametric algorithm which improves
	multi-sense wor
	d embeddings by
	(a) using multilingual (i.e., more than two languages) corpora to significantly
	improve
	sense embeddings beyond what one achieves with bilingual information, and (b)
	uses a principled approach to learn a variable number of senses per word, in a
	data-driven manner.
	Ours is the first approach with the ability to leverage multilingual corpora
	efficiently
	 for multi-sense representation learning.
	Experiments show that multilingual training significantly improves performance
	over monolingual and bilingual training, by allowing us to combine different
	parallel corpora to
	leverage multilingual context. Multilingual training yields comparable
	performance to a
	state of the art monolingual model trained on five times more training data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>upadhyay-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2614">
    <title>DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging</title>
    <author><first>Sheng</first><last>Chen</last></author>
    <author><first>Akshay</first><last>Soni</last></author>
    <author><first>Aasish</first><last>Pappu</last></author>
    <author><first>Yashar</first><last>Mehdad</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>111&#8211;120</pages>
    <url>http://www.aclweb.org/anthology/W17-2614</url>
    <abstract>Tagging news articles or blog posts with relevant tags from a collection of
	predefined ones is coined as document tagging in this work. Accurate tagging of
	articles can benefit several downstream applications such as recommendation and
	search. In this work, we propose a novel yet simple approach called DocTag2Vec
	to accomplish this task. We substantially extend Word2Vec and Doc2Vec &#8211; two
	popular models for learning  distributed representation of words and documents.
	In DocTag2Vec, we simultaneously learn the representation of words, documents,
	and tags in a joint vector space during training, and employ the simple
	k-nearest neighbor search to predict tags for unseen documents. In contrast to
	previous multi-label learning methods, DocTag2Vec directly deals with raw text
	instead of provided feature vector, and in addition, enjoys advantages like the
	learning of tag representation, and the ability of handling newly created tags.
	To demonstrate the effectiveness of our approach, we conduct experiments on
	several datasets and show promising results against state-of-the-art methods.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chen-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2615">
    <title>Binary Paragraph Vectors</title>
    <author><first>Karol</first><last>Grzegorczyk</last></author>
    <author><first>Marcin</first><last>Kurdziel</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>121&#8211;130</pages>
    <url>http://www.aclweb.org/anthology/W17-2615</url>
    <abstract>Recently Le &#38; Mikolov described two log-linear models, called Paragraph Vector,
	that can be used to learn state-of-the-art distributed representations of
	documents. Inspired by this work, we present Binary Paragraph Vector models:
	simple neural networks that learn short binary codes for fast information
	retrieval. We show that binary paragraph vectors outperform autoencoder-based
	binary codes, despite using fewer bits. We also evaluate their precision in
	transfer learning settings, where binary codes are inferred for documents
	unrelated to the training corpus. Results from these experiments indicate that
	binary paragraph vectors can capture semantics relevant for various
	domain-specific documents. Finally, we present a model that simultaneously
	learns short binary codes and longer, real-valued representations. This model
	can be used to rapidly retrieve a short list of highly relevant documents from
	a large document collection.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>grzegorczyk-kurdziel:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2616">
    <title>Representing Compositionality based on Multiple Timescales Gated Recurrent Neural Networks with Adaptive Temporal Hierarchy for Character-Level Language Models</title>
    <author><first>Dennis Singh</first><last>Moirangthem</last></author>
    <author><first>Jegyung</first><last>Son</last></author>
    <author><first>Minho</first><last>Lee</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>131&#8211;138</pages>
    <url>http://www.aclweb.org/anthology/W17-2616</url>
    <abstract>A novel character-level neural language model is proposed in this paper. The
	proposed model incorporates a biologically inspired temporal hierarchy in the
	architecture for representing multiple compositions of language in order to
	handle longer sequences for the character-level language model. The temporal
	hierarchy is introduced in the language model by utilizing a Gated Recurrent
	Neural Network with multiple timescales. The proposed model incorporates a
	timescale adaptation mechanism for enhancing the performance of the language
	model. We evaluate our proposed model using the popular Penn Treebank and Text8
	corpora. The experiments show that the use of multiple timescales in a Neural
	Language Model (NLM) enables improved performance despite having fewer
	parameters and with no additional computation requirements. Our experiments
	also demonstrate the ability of the adaptive temporal hierarchies to represent
	multiple compositonality without the help of complex hierarchical architectures
	and shows that better representation of the longer sequences lead to enhanced
	performance of the probabilistic language model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>moirangthem-son-lee:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2617">
    <title>Learning Bilingual Projections of Embeddings for Vocabulary Expansion in Machine Translation</title>
    <author><first>Pranava Swaroop</first><last>Madhyastha</last></author>
    <author><first>Cristina</first><last>Espa&#241;a-Bonet</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>139&#8211;145</pages>
    <url>http://www.aclweb.org/anthology/W17-2617</url>
    <abstract>We propose a simple log-bilinear softmax-based model to deal with vocabulary
	expansion in machine translation. Our model uses word embeddings trained on
	significantly large unlabelled monolingual
	corpora and learns over a fairly small, word-to-word bilingual dictionary.
	Given an out-of-vocabulary source word, the model generates a probabilistic
	list of possible translations in the target language using the trained
	bilingual embeddings. We integrate these translation options into a standard
	phrase-based statistical machine translation system and obtain consistent
	improvements in translation quality on the English&#8211;Spanish language pair. When
	tested over an out-of-domain testset, we get a significant improvement of 3.9
	BLEU points.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>madhyastha-espanabonet:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2618">
    <title>Prediction of Frame-to-Frame Relations in the FrameNet Hierarchy with Frame Embeddings</title>
    <author><first>Teresa</first><last>Botschen</last></author>
    <author><first>Hatem</first><last>Mousselly Sergieh</last></author>
    <author><first>Iryna</first><last>Gurevych</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>146&#8211;156</pages>
    <url>http://www.aclweb.org/anthology/W17-2618</url>
    <abstract>Automatic completion of frame-to-frame (F2F) relations in the FrameNet (FN)
	hierarchy has received little attention, although they incorporate meta-level
	commonsense knowledge and are used in downstream approaches. We address the
	problem of sparsely annotated F2F relations. First, we examine whether the
	manually defined F2F relations emerge from text by learning text-based frame
	embeddings. Our analysis reveals insights about the difficulty of
	reconstructing F2F relations purely from text. Second, we present different
	systems for predicting F2F relations; our best-performing one uses the FN
	hierarchy to train on and to ground embeddings in. A comparison of systems and
	embeddings exposes the crucial influence of knowledge-based embeddings to a
	system’s performance in predicting F2F relations.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>botschen-moussellysergieh-gurevych:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2619">
    <title>Learning Joint Multilingual Sentence Representations with Neural Machine Translation</title>
    <author><first>Holger</first><last>Schwenk</last></author>
    <author><first>Matthijs</first><last>Douze</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>157&#8211;167</pages>
    <url>http://www.aclweb.org/anthology/W17-2619</url>
    <abstract>In this paper, we use the framework of neural machine translation to learn
	joint sentence representations across six very different languages. Our aim is
	that a representation which is independent of the language, is likely to
	capture the underlying semantics.  We define a new cross-lingual similarity
	measure, compare up to 1.4M sentence representations and study the
	characteristics of close sentences.
	We provide experimental evidence that sentences that are close in embedding
	space are indeed semantically highly related, but often have quite different
	structure and syntax.  These relations also hold when comparing sentences in
	different languages.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>schwenk-douze:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2620">
    <title>Transfer Learning for Speech Recognition on a Budget</title>
    <author><first>Julius</first><last>Kunze</last></author>
    <author><first>Louis</first><last>Kirsch</last></author>
    <author><first>Ilia</first><last>Kurenkov</last></author>
    <author><first>Andreas</first><last>Krug</last></author>
    <author><first>Jens</first><last>Johannsmeier</last></author>
    <author><first>Sebastian</first><last>Stober</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>168&#8211;177</pages>
    <url>http://www.aclweb.org/anthology/W17-2620</url>
    <abstract>End-to-end training of automated speech recognition (ASR) systems requires
	massive data and compute resources. We explore transfer learning based on model
	adaptation as an approach for training ASR models under constrained GPU memory,
	throughput and training data. We conduct several systematic experiments
	adapting a Wav2Letter convolutional neural network originally trained for
	English ASR to the German language. We show that this technique allows faster
	training on consumer-grade resources while requiring less training data in
	order to achieve the same accuracy, thereby lowering the cost of training ASR
	models in other languages. Model introspection revealed that small adaptations
	to the network's weights were sufficient for good performance, especially for
	inner layers.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kunze-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2621">
    <title>Gradual Learning of Matrix-Space Models of Language for Sentiment Analysis</title>
    <author><first>Shima</first><last>Asaadi</last></author>
    <author><first>Sebastian</first><last>Rudolph</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>178&#8211;185</pages>
    <url>http://www.aclweb.org/anthology/W17-2621</url>
    <abstract>Learning word representations to capture the semantics and compositionality of
	language has received much research interest in natural language processing.
	Beyond the popular vector space models, matrix representations for words have
	been proposed, since then, matrix multiplication can serve as natural
	composition operation. In this work, we investigate the problem of learning
	matrix representations of words. We present a learning approach for
	compositional matrix-space models for the task of sentiment analysis. We show
	that our approach, which learns the matrices gradually in two steps,
	outperforms other approaches and a gradient-descent baseline in terms of
	quality and computational cost.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>asaadi-rudolph:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2622">
    <title>Improving Language Modeling using Densely Connected Recurrent Neural Networks</title>
    <author><first>Fr&#233;deric</first><last>Godin</last></author>
    <author><first>Joni</first><last>Dambre</last></author>
    <author><first>Wesley</first><last>De Neve</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>186&#8211;190</pages>
    <url>http://www.aclweb.org/anthology/W17-2622</url>
    <abstract>In this paper, we introduce the novel concept of densely connected layers into
	recurrent neural networks. We evaluate our proposed architecture on the Penn
	Treebank language modeling task. We show that we can obtain similar perplexity
	scores with six times fewer parameters compared to a standard stacked 2-
	layer LSTM model trained with dropout (Zaremba et al., 2014). In contrast with
	the current usage of skip connections, we show that densely connecting only a
	few
	stacked layers with skip connections already yields significant perplexity
	reductions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>godin-dambre-deneve:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2623">
    <title>NewsQA: A Machine Comprehension Dataset</title>
    <author><first>Adam</first><last>Trischler</last></author>
    <author><first>Tong</first><last>Wang</last></author>
    <author><first>Xingdi</first><last>Yuan</last></author>
    <author><first>Justin</first><last>Harris</last></author>
    <author><first>Alessandro</first><last>Sordoni</last></author>
    <author><first>Philip</first><last>Bachman</last></author>
    <author><first>Kaheer</first><last>Suleman</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>191&#8211;200</pages>
    <url>http://www.aclweb.org/anthology/W17-2623</url>
    <abstract>We present NewsQA, a challenging machine comprehension dataset of over 100,000
	human-generated question-answer pairs. Crowdworkers supply questions and
	answers based on a set of over 10,000 news articles from CNN, with answers
	consisting of spans of text in the articles. We collect this dataset through a
	four-stage process designed to solicit exploratory questions that require
	reasoning. Analysis confirms that NewsQA demands abilities beyond simple
	word matching and recognizing textual entailment. We measure human performance
	on the dataset and compare it to several strong neural models. The performance
	gap between humans and machines (13.3% F1) indicates that significant progress
	can be made on NewsQA through future research. The dataset is freely available
	online.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>trischler-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2624">
    <title>Intrinsic and Extrinsic Evaluation of Spatiotemporal Text Representations in Twitter Streams</title>
    <author><first>Lawrence</first><last>Phillips</last></author>
    <author><first>Kyle</first><last>Shaffer</last></author>
    <author><first>Dustin</first><last>Arendt</last></author>
    <author><first>Nathan</first><last>Hodas</last></author>
    <author><first>Svitlana</first><last>Volkova</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>201&#8211;210</pages>
    <url>http://www.aclweb.org/anthology/W17-2624</url>
    <abstract>Language in social media is a dynamic system, constantly evolving and adapting,
	with words and concepts rapidly emerging, disappearing, and changing their
	meaning. These changes can be estimated using word representations in context,
	over time and across locations. A number of methods have been proposed to track
	these spatiotemporal changes but no general method exists to evaluate the
	quality of these representations. Previous work largely focused on qualitative
	evaluation, which we improve by proposing a set of visualizations that
	highlight changes in text representation over both space and time. We
	demonstrate usefulness of novel spatiotemporal representations to explore and
	characterize specific aspects of the corpus of tweets collected from European
	countries over a two-week period centered around the terrorist attacks in
	Brussels in March 2016. In addition, we quantitatively evaluate spatiotemporal
	representations by feeding them into a downstream classification task &#8211; event
	type prediction. Thus, our work is the first to provide both intrinsic
	(qualitative) and extrinsic (quantitative) evaluation of text representations
	for spatiotemporal trends.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>phillips-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2625">
    <title>Rethinking Skip-thought: A Neighborhood based Approach</title>
    <author><first>Shuai</first><last>Tang</last></author>
    <author><first>Hailin</first><last>Jin</last></author>
    <author><first>Chen</first><last>Fang</last></author>
    <author><first>Zhaowen</first><last>Wang</last></author>
    <author><first>Virginia</first><last>de Sa</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>211&#8211;218</pages>
    <url>http://www.aclweb.org/anthology/W17-2625</url>
    <attachment type="poster">W17-2625.Poster.pdf</attachment>
    <abstract>We study the skip-thought model with neighborhood information as weak
	supervision. More specifically, we propose a skip-thought neighbor model to
	consider the adjacent sentences as a neighborhood. We train our skip-thought
	neighbor model on a large corpus with continuous sentences, and then evaluate
	the trained model on 7 tasks, which include semantic relatedness, paraphrase
	detection, and classification benchmarks. Both quantitative comparison and
	qualitative investigation are conducted. We empirically show that, our
	skip-thought neighbor model performs as well as the skip-thought model on
	evaluation tasks. In addition, we found that, incorporating an autoencoder path
	in our model didn't aid our model to perform better, while it hurts the
	performance of the skip-thought model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tang-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2626">
    <title>A Frame Tracking Model for Memory-Enhanced Dialogue Systems</title>
    <author><first>Hannes</first><last>Schulz</last></author>
    <author><first>Jeremie</first><last>Zumer</last></author>
    <author><first>Layla</first><last>El Asri</last></author>
    <author><first>Shikhar</first><last>Sharma</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>219&#8211;227</pages>
    <url>http://www.aclweb.org/anthology/W17-2626</url>
    <abstract>Recently, resources and tasks were proposed to go beyond state tracking in
	dialogue systems. An example is the frame tracking task, which requires
	recording multiple frames, one for each user goal set during the dialogue. This
	allows a user, for instance, to compare items corresponding to different goals.
	This paper proposes a model which takes as input the list of frames created so
	far during the dialogue, the current user utterance as well as the dialogue
	acts, slot types, and slot values associated with this utterance. The model
	then outputs the frame being referenced by each  triple of dialogue act, slot
	type, and slot value. We show that on the recently published Frames dataset,
	this model significantly outperforms a previously proposed rule-based baseline.
	In addition, we propose an extensive analysis of the frame tracking task by
	dividing it into sub-tasks and assessing their difficulty with respect to our
	model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>schulz-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2627">
    <title>Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning</title>
    <author><first>Caglar</first><last>Gulcehre</last></author>
    <author><first>Francis</first><last>Dutil</last></author>
    <author><first>Adam</first><last>Trischler</last></author>
    <author><first>Yoshua</first><last>Bengio</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>228&#8211;234</pages>
    <url>http://www.aclweb.org/anthology/W17-2627</url>
    <abstract>We investigate the integration of a planning mechanism into an encoder-decoder
	architecture with attention. We develop a model that can plan ahead when it
	computes alignments between the source and target sequences not only for a
	single time-step but for the next k time-steps as well by constructing a matrix
	of proposed future alignments and a commitment vector that governs whether to
	follow or recompute the plan. This mechanism is inspired by strategic attentive
	reader and writer (STRAW) model, a recent neural architecture for planning with
	hierarchical reinforcement learning that can also learn higher level temporal
	abstractions. Our proposed model is end-to-end trainable with differentiable
	operations. We show that our model outperforms strong baselines on
	character-level translation task from WMT'15 with fewer parameters and computes
	alignments that are qualitatively intuitive.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>gulcehre-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2628">
    <title>Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology-Based Representations</title>
    <author><first>Paul</first><last>Michel</last></author>
    <author><first>Abhilasha</first><last>Ravichander</last></author>
    <author><first>Shruti</first><last>Rijhwani</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>235&#8211;240</pages>
    <url>http://www.aclweb.org/anthology/W17-2628</url>
    <abstract>We investigate the pertinence of methods from algebraic topology for text data
	analysis. These methods enable the development of mathematically-principled
	isometric-invariant mappings from a set of vectors to a document embedding,
	which is stable with respect to the geometry of the document in the selected
	metric space. 
	In this work, we evaluate the utility of these topology-based document
	representations in traditional NLP tasks, specifically document clustering and
	sentiment classification.
	We find that the embeddings do not benefit text analysis. In fact, performance
	is worse than simple techniques like tf-idf, indicating that the geometry of
	the document does not provide enough variability for classification on the
	basis of topic or sentiment in the chosen datasets.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>michel-ravichander-rijhwani:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2629">
    <title>Adversarial Generation of Natural Language</title>
    <author><first>Sandeep</first><last>Subramanian</last></author>
    <author><first>Sai</first><last>Rajeswar</last></author>
    <author><first>Francis</first><last>Dutil</last></author>
    <author><first>Chris</first><last>Pal</last></author>
    <author><first>Aaron</first><last>Courville</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>241&#8211;251</pages>
    <url>http://www.aclweb.org/anthology/W17-2629</url>
    <abstract>Generative Adversarial Networks (GANs) have gathered a lot of attention from
	the computer vision community, yielding impressive results for image
	generation. Advances in the adversarial generation of natural language from
	noise however are not commensurate with the progress made in generating images,
	and still lag far behind likelihood based methods. In this paper, we take a
	step towards generating natural language  with a GAN objective alone. We
	introduce a simple baseline that addresses the discrete output space problem
	without relying on gradient estimators and show that it is able to achieve
	state-of-the-art results on a Chinese poem generation dataset. We present
	quantitative results on generating sentences from context-free and
	probabilistic context-free grammars, and qualitative language modeling results.
	A conditional version is also described that can generate sequences conditioned
	on sentence characteristics.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>subramanian-EtAl:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2630">
    <title>Deep Active Learning for Named Entity Recognition</title>
    <author><first>Yanyao</first><last>Shen</last></author>
    <author><first>Hyokun</first><last>Yun</last></author>
    <author><first>Zachary</first><last>Lipton</last></author>
    <author><first>Yakov</first><last>Kronrod</last></author>
    <author><first>Animashree</first><last>Anandkumar</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>252&#8211;256</pages>
    <url>http://www.aclweb.org/anthology/W17-2630</url>
    <abstract>Deep neural networks 
	have advanced the state of the art 
	in named entity recognition. 
	However, under typical training procedures,
	advantages over classical methods
	emerge only with large datasets. 
	As a result,  deep learning is employed
	only when large public datasets or a large budget 
	for manually labeling data is available. 
	In this work, we show otherwise: 
	by combining deep learning with active learning,
	we can outperform classical methods 
	even with a significantly smaller amount of training data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shen-EtAl:2017:RepL4NLP2</bibkey>
  </paper>

  <paper id="2631">
    <title>Learning when to skim and when to read</title>
    <author><first>Alexander</first><last>Johansen</last></author>
    <author><first>Richard</first><last>Socher</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>257&#8211;264</pages>
    <url>http://www.aclweb.org/anthology/W17-2631</url>
    <abstract>Many recent advances in deep learning for natural language processing have come
	at increasing computational cost, but the power of these state-of-the-art
	models is not needed for every example in a dataset. We demonstrate two
	approaches to reducing unnecessary computation in cases where a fast but weak
	baseline classier and a stronger, slower model are both available. Applying an
	AUC-based metric to the task of sentiment classification, we find significant
	efficiency gains with both a probability-threshold method for reducing
	computational cost and one that uses a secondary decision network.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>johansen-socher:2017:RepL4NLP</bibkey>
  </paper>

  <paper id="2632">
    <title>Learning to Embed Words in Context for Syntactic Tasks</title>
    <author><first>Lifu</first><last>Tu</last></author>
    <author><first>Kevin</first><last>Gimpel</last></author>
    <author><first>Karen</first><last>Livescu</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Representation Learning for NLP</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver, Canada</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>265&#8211;275</pages>
    <url>http://www.aclweb.org/anthology/W17-2632</url>
    <abstract>We present models for embedding words in the context of surrounding words. 
	Such models, which we refer to as token embeddings, represent the
	characteristics of a word that are specific to a given context, such as word
	sense, syntactic category, and semantic role. We explore simple, efficient
	token embedding models based on standard neural network architectures. We learn
	token embeddings on a large amount of unannotated text and evaluate them as
	features for part-of-speech taggers and dependency parsers trained on much
	smaller amounts of annotated data.  We find that predictors endowed with token
	embeddings consistently outperform baseline predictors across a range of
	context window and training set sizes.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tu-gimpel-livescu:2017:RepL4NLP</bibkey>
  </paper>

</volume>

