<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W17">
  <paper id="4300">
    <title>Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing</title>
    <editor>Kai-Wei Chang</editor>
    <editor>Ming-Wei Chang</editor>
    <editor>Vivek Srikumar</editor>
    <editor>Alexander M. Rush</editor>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/W17-43</url>
    <bibtype>book</bibtype>
    <bibkey>StructPred:2017</bibkey>
  </paper>

  <paper id="4301">
    <title>Dependency Parsing with Dilated Iterated Graph CNNs</title>
    <author><first>Emma</first><last>Strubell</last></author>
    <author><first>Andrew</first><last>McCallum</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;6</pages>
    <url>http://www.aclweb.org/anthology/W17-4301</url>
    <abstract>Dependency parses are an effective way to inject linguistic knowledge into many
	downstream tasks, and many practitioners wish to efficiently parse sentences at
	scale. Recent advances in GPU hardware have enabled neural networks to achieve
	significant gains over the previous best models, these models still fail to
	leverage GPUs' capability for massive parallelism due to their requirement of
	sequential processing of the sentence. In response, we propose Dilated Iterated
	Graph Convolutional Neural Networks (DIG-CNNs) for
	graph-based dependency parsing, a graph convolutional architecture that allows
	for efficient end-to-end GPU parsing. In experiments on the English Penn
	TreeBank benchmark, we show that DIG-CNNs perform on par with some of the best
	neural network parsers.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>strubell-mccallum:2017:StructPred</bibkey>
  </paper>

  <paper id="4302">
    <title>Entity Identification as Multitasking</title>
    <author><first>Karl</first><last>Stratos</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>7&#8211;11</pages>
    <url>http://www.aclweb.org/anthology/W17-4302</url>
    <abstract>Standard approaches in entity identification hard-code boundary detection and
	type prediction into labels and perform Viterbi. This has two disadvantages: 1.
	the runtime complexity grows quadratically in the number of types, and 2. there
	is no natural segment-level representation. In this paper, we propose a neural
	architecture that addresses these disadvantages. We frame the problem as
	multitasking, separating boundary detection and type prediction but optimizing
	them jointly. Despite its simplicity, this architecture performs competitively
	with fully structured models such as BiLSTM-CRFs while scaling linearly in the
	number of types. Furthermore, by construction, the model induces
	type-disambiguating embeddings of predicted mentions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>stratos:2017:StructPred</bibkey>
  </paper>

  <paper id="4303">
    <title>Towards Neural Machine Translation with Latent Tree Attention</title>
    <author><first>James</first><last>Bradbury</last></author>
    <author><first>Richard</first><last>Socher</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>12&#8211;16</pages>
    <url>http://www.aclweb.org/anthology/W17-4303</url>
    <attachment type="attachment">W17-4303.Attachment.zip</attachment>
    <abstract>Building models that take advantage of the hierarchical structure of language
	without a priori annotation is a longstanding goal in natural language
	processing. We introduce such a model for the task of machine translation,
	pairing a recurrent neural network grammar encoder with a novel attentional
	RNNG decoder and applying policy gradient reinforcement learning to induce
	unsupervised tree structures on both the source and target. When trained on
	character-level datasets with no explicit segmentation or parse annotation, the
	model learns a plausible segmentation and shallow parse, obtaining performance
	close to an attentional baseline.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bradbury-socher:2017:StructPred</bibkey>
  </paper>

  <paper id="4304">
    <title>Structured Prediction via Learning to Search under Bandit Feedback</title>
    <author><first>Amr</first><last>Sharaf</last></author>
    <author><first>Hal</first><last>Daum&#233; III</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>17&#8211;26</pages>
    <url>http://www.aclweb.org/anthology/W17-4304</url>
    <abstract>We present an algorithm for structured prediction under online bandit feedback.
	The learner repeatedly predicts a sequence of actions, generating a structured
	output. It then observes feedback for that output and no others. We consider
	two cases: a pure bandit setting in which it only observes a loss, and more
	fine-grained feedback in which it observes a loss for every action. We find
	that the fine-grained feedback is necessary for strong empirical performance,
	because it allows for a robust variance-reduction strategy. We empirically
	compare a number of different algorithms and exploration methods and show the
	efficacy of BLS on sequence labeling and dependency parsing tasks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sharaf-daumeiii:2017:StructPred</bibkey>
  </paper>

  <paper id="4305">
    <title>Syntax Aware LSTM model for Semantic Role Labeling</title>
    <author><first>Feng</first><last>Qian</last></author>
    <author><first>Lei</first><last>Sha</last></author>
    <author><first>Baobao</first><last>Chang</last></author>
    <author><first>LuChen</first><last>Liu</last></author>
    <author><first>Ming</first><last>Zhang</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>27&#8211;32</pages>
    <url>http://www.aclweb.org/anthology/W17-4305</url>
    <abstract>In Semantic Role Labeling (SRL) task, the tree structured dependency relation
	is rich in syntax information, but it is not well handled by existing models.
	In this paper, we propose Syntax Aware Long Short Time Memory (SA-LSTM). The
	structure of SA-LSTM changes according to dependency structure of each
	sentence, so that SA-LSTM can model the whole tree structure of dependency
	relation in an architecture engineering way. Experiments demonstrate that on
	Chinese Proposition Bank (CPB) 1.0, SA-LSTM improves F1 by 2.06% than ordinary
	bi-LSTM with feature engineered dependency relation information, and gives
	state-of-the-art F1 of 79.92%. On English CoNLL 2005 dataset, SA-LSTM brings
	improvement (2.1%) to bi-LSTM model and also brings slight improvement (0.3%)
	when added to the state-of-the-art model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>qian-EtAl:2017:StructPred</bibkey>
  </paper>

  <paper id="4306">
    <title>Spatial Language Understanding with Multimodal Graphs using Declarative Learning based Programming</title>
    <author><first>Parisa</first><last>Kordjamshidi</last></author>
    <author><first>Taher</first><last>Rahgooy</last></author>
    <author><first>Umar</first><last>Manzoor</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>33&#8211;43</pages>
    <url>http://www.aclweb.org/anthology/W17-4306</url>
    <abstract>This work is on a previously formalized semantic evaluation task of spatial
	role labeling (SpRL) that aims at extraction of formal spatial meaning from
	text. Here, we report the results of initial efforts towards exploiting visual
	information in the form of images to help spatial language understanding. We
	discuss the way of designing new models in the framework of declarative
	learning-based programming (DeLBP). The DeLBP framework facilitates combining
	modalities and representing various data in a unified graph. 
	  The learning and inference models exploit the structure of the unified graph
	as well as the global first order domain constraints beyond the data to predict
	the semantics which forms a structured meaning representation of the spatial
	context. Continuous representations are used to relate the various elements of
	the graph originating from different modalities. We improved over the
	state-of-the-art results on SpRL.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kordjamshidi-rahgooy-manzoor:2017:StructPred</bibkey>
  </paper>

  <paper id="4307">
    <title>Boosting Information Extraction Systems with Character-level Neural Networks and Free Noisy Supervision</title>
    <author><first>Philipp</first><last>Meerkamp</last></author>
    <author><first>Zhengyi</first><last>Zhou</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>44&#8211;51</pages>
    <url>http://www.aclweb.org/anthology/W17-4307</url>
    <abstract>We present an architecture to boost the precision of existing information
	extraction systems. This is achieved by augmenting the existing parser, which
	may be constraint-based or hybrid statistical, with a character-level neural
	network. Our architecture combines the ability of constraint-based or hybrid
	extraction systems to easily incorporate domain knowledge with the ability of
	deep neural networks to leverage large amounts of data to learn complex
	features. The network is trained using a measure of consistency between
	extracted data and existing databases as a form of cheap, noisy supervision.
	Our architecture does not require large scale manual annotation or a system
	rewrite. It has led to large precision improvements over an existing,
	highly-tuned production information extraction system used at Bloomberg LP for
	financial language text.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>meerkamp-zhou:2017:StructPred</bibkey>
  </paper>

  <paper id="4308">
    <title>Piecewise Latent Variables for Neural Variational Text Processing</title>
    <author><first>Iulian Vlad</first><last>Serban</last></author>
    <author><first>Alexander</first><last>Ororbia II</last></author>
    <author><first>Joelle</first><last>Pineau</last></author>
    <author><first>Aaron</first><last>Courville</last></author>
    <booktitle>Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing</booktitle>
    <month>September</month>
    <year>2017</year>
    <address>Copenhagen, Denmark</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>52&#8211;62</pages>
    <url>http://www.aclweb.org/anthology/W17-4308</url>
    <abstract>Advances in neural variational inference have facilitated the learning of
	powerful directed graphical models with continuous latent variables, such as
	variational autoencoders. The hope is that such models will learn to represent
	rich, multi-modal latent factors in real-world data, such as natural language
	text. However, current models often assume simplistic priors on the latent
	variables - such as the uni-modal Gaussian distribution - which are incapable
	of representing complex latent factors efficiently. To overcome this
	restriction, we propose the simple, but highly flexible, piecewise constant
	distribution. This distribution has the capacity to represent an exponential
	number of modes of a latent target distribution, while remaining mathematically
	tractable. Our results demonstrate that incorporating this new latent
	distribution into different models yields substantial improvements in natural
	language processing tasks such as document modeling and natural language
	generation for dialogue.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>serban-EtAl:2017:StructPred</bibkey>
  </paper>

</volume>

