Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing

Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing Kai-Wei Chang Ming-Wei Chang Vivek Srikumar Alexander M. Rush September 2017

Copenhagen, Denmark

Association for Computational Linguistics http://www.aclweb.org/anthology/W17-43 book StructPred:2017 Dependency Parsing with Dilated Iterated Graph CNNs EmmaStrubell AndrewMcCallum Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing September 2017

Copenhagen, Denmark

Association for Computational Linguistics 1–6 http://www.aclweb.org/anthology/W17-4301 Dependency parses are an effective way to inject linguistic knowledge into many downstream tasks, and many practitioners wish to efficiently parse sentences at scale. Recent advances in GPU hardware have enabled neural networks to achieve significant gains over the previous best models, these models still fail to leverage GPUs' capability for massive parallelism due to their requirement of sequential processing of the sentence. In response, we propose Dilated Iterated Graph Convolutional Neural Networks (DIG-CNNs) for graph-based dependency parsing, a graph convolutional architecture that allows for efficient end-to-end GPU parsing. In experiments on the English Penn TreeBank benchmark, we show that DIG-CNNs perform on par with some of the best neural network parsers. inproceedings strubell-mccallum:2017:StructPred Entity Identification as Multitasking KarlStratos Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing September 2017

Copenhagen, Denmark

Association for Computational Linguistics 7–11 http://www.aclweb.org/anthology/W17-4302 Standard approaches in entity identification hard-code boundary detection and type prediction into labels and perform Viterbi. This has two disadvantages: 1. the runtime complexity grows quadratically in the number of types, and 2. there is no natural segment-level representation. In this paper, we propose a neural architecture that addresses these disadvantages. We frame the problem as multitasking, separating boundary detection and type prediction but optimizing them jointly. Despite its simplicity, this architecture performs competitively with fully structured models such as BiLSTM-CRFs while scaling linearly in the number of types. Furthermore, by construction, the model induces type-disambiguating embeddings of predicted mentions. inproceedings stratos:2017:StructPred Towards Neural Machine Translation with Latent Tree Attention JamesBradbury RichardSocher Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing September 2017

Copenhagen, Denmark

Association for Computational Linguistics 12–16 http://www.aclweb.org/anthology/W17-4303 W17-4303.Attachment.zip Building models that take advantage of the hierarchical structure of language without a priori annotation is a longstanding goal in natural language processing. We introduce such a model for the task of machine translation, pairing a recurrent neural network grammar encoder with a novel attentional RNNG decoder and applying policy gradient reinforcement learning to induce unsupervised tree structures on both the source and target. When trained on character-level datasets with no explicit segmentation or parse annotation, the model learns a plausible segmentation and shallow parse, obtaining performance close to an attentional baseline. inproceedings bradbury-socher:2017:StructPred Structured Prediction via Learning to Search under Bandit Feedback AmrSharaf HalDaumé III Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing September 2017

Copenhagen, Denmark

Association for Computational Linguistics 17–26 http://www.aclweb.org/anthology/W17-4304 We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: a pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy. We empirically compare a number of different algorithms and exploration methods and show the efficacy of BLS on sequence labeling and dependency parsing tasks. inproceedings sharaf-daumeiii:2017:StructPred Syntax Aware LSTM model for Semantic Role Labeling FengQian LeiSha BaobaoChang LuChenLiu MingZhang Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing September 2017

Copenhagen, Denmark

Association for Computational Linguistics 27–32 http://www.aclweb.org/anthology/W17-4305 In Semantic Role Labeling (SRL) task, the tree structured dependency relation is rich in syntax information, but it is not well handled by existing models. In this paper, we propose Syntax Aware Long Short Time Memory (SA-LSTM). The structure of SA-LSTM changes according to dependency structure of each sentence, so that SA-LSTM can model the whole tree structure of dependency relation in an architecture engineering way. Experiments demonstrate that on Chinese Proposition Bank (CPB) 1.0, SA-LSTM improves F1 by 2.06% than ordinary bi-LSTM with feature engineered dependency relation information, and gives state-of-the-art F1 of 79.92%. On English CoNLL 2005 dataset, SA-LSTM brings improvement (2.1%) to bi-LSTM model and also brings slight improvement (0.3%) when added to the state-of-the-art model. inproceedings qian-EtAl:2017:StructPred Spatial Language Understanding with Multimodal Graphs using Declarative Learning based Programming ParisaKordjamshidi TaherRahgooy UmarManzoor Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing September 2017

Copenhagen, Denmark

Association for Computational Linguistics 33–43 http://www.aclweb.org/anthology/W17-4306 This work is on a previously formalized semantic evaluation task of spatial role labeling (SpRL) that aims at extraction of formal spatial meaning from text. Here, we report the results of initial efforts towards exploiting visual information in the form of images to help spatial language understanding. We discuss the way of designing new models in the framework of declarative learning-based programming (DeLBP). The DeLBP framework facilitates combining modalities and representing various data in a unified graph. The learning and inference models exploit the structure of the unified graph as well as the global first order domain constraints beyond the data to predict the semantics which forms a structured meaning representation of the spatial context. Continuous representations are used to relate the various elements of the graph originating from different modalities. We improved over the state-of-the-art results on SpRL. inproceedings kordjamshidi-rahgooy-manzoor:2017:StructPred Boosting Information Extraction Systems with Character-level Neural Networks and Free Noisy Supervision PhilippMeerkamp ZhengyiZhou Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing September 2017

Copenhagen, Denmark

Association for Computational Linguistics 44–51 http://www.aclweb.org/anthology/W17-4307 We present an architecture to boost the precision of existing information extraction systems. This is achieved by augmenting the existing parser, which may be constraint-based or hybrid statistical, with a character-level neural network. Our architecture combines the ability of constraint-based or hybrid extraction systems to easily incorporate domain knowledge with the ability of deep neural networks to leverage large amounts of data to learn complex features. The network is trained using a measure of consistency between extracted data and existing databases as a form of cheap, noisy supervision. Our architecture does not require large scale manual annotation or a system rewrite. It has led to large precision improvements over an existing, highly-tuned production information extraction system used at Bloomberg LP for financial language text. inproceedings meerkamp-zhou:2017:StructPred Piecewise Latent Variables for Neural Variational Text Processing Iulian VladSerban AlexanderOrorbia II JoellePineau AaronCourville Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing September 2017

Copenhagen, Denmark

Association for Computational Linguistics 52–62 http://www.aclweb.org/anthology/W17-4308 Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variables, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue. inproceedings serban-EtAl:2017:StructPred