<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W18">
  <paper id="5400">
    <title>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</title>
    <editor><first>Johns Hopkins University</first><last>Tal Linzen</last></editor>
    <editor><first>Tilburg University</first><last>Grzegorz Chrupała</last></editor>
    <editor><first>Tilburg University</first><last>Afra Alishahi</last></editor>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/W18-54</url>
    <bibtype>book</bibtype>
    <bibkey>BlackboxNLP:2018</bibkey>
  </paper>

  <paper id="5401">
    <title>When does deep multi-task learning work for loosely related document classification tasks?</title>
    <author><first>Emma</first><last>Kerinec</last></author>
    <author><first>Chlo&#233;</first><last>Braud</last></author>
    <author><first>Anders</first><last>Søgaard</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;8</pages>
    <url>http://www.aclweb.org/anthology/W18-5401</url>
    <abstract>This work aims to contribute to our understanding of when multi-task learning through parameter sharing in deep neural networks leads to improvements over single-task learning. We focus on the setting of learning from loosely related tasks, for which no theoretical guarantees exist. We therefore approach the question empirically, studying which properties of datasets and single-task learning characteristics correlate with improvements from multi-task learning. We are the first to study this in a text classification setting and across more than 500 different task pairs.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kerinec-braud-sgaard:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5402">
    <title>Analyzing Learned Representations of a Deep ASR Performance Prediction Model</title>
    <author><first>Zied</first><last>Elloumi</last></author>
    <author><first>Laurent</first><last>Besacier</last></author>
    <author><first>Olivier</first><last>Galibert</last></author>
    <author><first>Benjamin</first><last>Lecouteux</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>9&#8211;15</pages>
    <url>http://www.aclweb.org/anthology/W18-5402</url>
    <abstract>This paper addresses a relatively new task: prediction of ASR performance on unseen broadcast programs. </abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>elloumi-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5403">
    <title>Explaining non-linear Classifier Decisions within Kernel-based Deep Architectures</title>
    <author><first>Danilo</first><last>Croce</last></author>
    <author><first>Daniele</first><last>Rossini</last></author>
    <author><first>Roberto</first><last>Basili</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>16&#8211;24</pages>
    <url>http://www.aclweb.org/anthology/W18-5403</url>
    <abstract>Nonlinear methods such as deep neural networks achieve state-of-the-art performances in several semantic NLP tasks. </abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>croce-rossini-basili:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5404">
    <title>Nightmare at test time: How punctuation prevents parsers from generalizing</title>
    <author><first>Anders</first><last>Søgaard</last></author>
    <author><first>Miryam</first><last>de Lhoneux</last></author>
    <author><first>Isabelle</first><last>Augenstein</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>25&#8211;29</pages>
    <url>http://www.aclweb.org/anthology/W18-5404</url>
    <abstract>Punctuation is a strong indicator of syntactic structure, and parsers trained on text with punctuation often rely heavily on this signal. Punctuation is a diversion, however, since human language processing does not rely on punctuation to the same extent, and in informal texts, we therefore often leave out punctuation. We also use punctuation ungrammatically for emphatic or creative purposes, or simply by mistake. We show that (a) dependency parsers are sensitive to both absence of punctuation and to alternative uses; (b) neural parsers tend to be more sensitive than vintage parsers; (c) training neural parsers without punctuation outperforms all out-of-the-box parsers across all scenarios where punctuation departs from standard punctuation. Our main experiments are on synthetically corrupted data to study the effect of punctuation in isolation and avoid potential confounds, but we also show effects on out-of-domain data.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sgaard-delhoneux-augenstein:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5405">
    <title>Evaluating Textual Representations through Image Generation</title>
    <author><first>Graham</first><last>Spinks</last></author>
    <author><first>Marie-Francine</first><last>Moens</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>30&#8211;39</pages>
    <url>http://www.aclweb.org/anthology/W18-5405</url>
    <abstract>We present a methodology for determining the quality of textual representations through the ability to generate images from them. Continuous representations of textual input are ubiquitous in modern Natural Language Processing techniques either at the core of machine learning algorithms or as the by-product at any given layer of a neural network. While current techniques to evaluate such representations focus on their performance on particular tasks, they don't provide a clear understanding of the level of informational detail that is stored within them, especially their ability to represent spatial information. The central premise of this paper is that visual inspection or analysis is the most convenient method to quickly and accurately determine information content. Through the use of text-to-image neural networks, we propose a new technique to compare the quality of textual representations by visualizing their information content. The method is illustrated on a medical dataset where the correct representation of spatial information and shorthands are of particular importance. For four different well-known textual representations, we show with a quantitative analysis that some representations are consistently able to deliver higher quality visualizations of the information content. Additionally, we show that the quantitative analysis technique correlates with the judgment of a human expert evaluator in terms of alignment.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>spinks-moens:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5406">
    <title>On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis</title>
    <author><first>Jose</first><last>Camacho-Collados</last></author>
    <author><first>Mohammad Taher</first><last>Pilehvar</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>40&#8211;46</pages>
    <url>http://www.aclweb.org/anthology/W18-5406</url>
    <abstract>Text preprocessing is often the first step in the pipeline of a Natural Language Processing (NLP) system, with potential impact in its final performance. Despite its importance, text preprocessing has not received much attention in the deep learning literature. In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier. We perform an extensive evaluation on standard benchmarks from text categorization and sentiment analysis. While our experiments show that a simple tokenization of input text is generally adequate, they also highlight significant degrees of variability across preprocessing techniques. This reveals the importance of paying attention to this usually-overlooked step in the pipeline, particularly when comparing different models. Finally, our evaluation provides insights into the best preprocessing practices for training word embeddings.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>camachocollados-pilehvar:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5407">
    <title>Jump to better conclusions: SCAN both left and right</title>
    <author><first>Joost</first><last>Bastings</last></author>
    <author><first>Marco</first><last>Baroni</last></author>
    <author><first>Jason</first><last>Weston</last></author>
    <author><first>Kyunghyun</first><last>Cho</last></author>
    <author><first>Douwe</first><last>Kiela</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>47&#8211;55</pages>
    <url>http://www.aclweb.org/anthology/W18-5407</url>
    <abstract>Lake &#38; Baroni (2018) recently introduced the SCAN data set, which consists of simple commands paired with action sequences and is intended to test the </abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bastings-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5408">
    <title>Understanding Convolutional Neural Networks for Text Classification</title>
    <author><first>Alon</first><last>Jacovi</last></author>
    <author><first>Oren</first><last>Sar Shalom</last></author>
    <author><first>Yoav</first><last>Goldberg</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>56&#8211;65</pages>
    <url>http://www.aclweb.org/anthology/W18-5408</url>
    <abstract>We present an analysis into the inner workings of Convolutional Neural Networks (CNNs) for processing text. CNNs used for computer vision can be interpreted by projecting filters into image space, but for discrete sequence inputs CNNs remain a mystery. We aim to understand the method by which the networks process and classify text. We examine common hypotheses to this problem: that filters, accompanied by global max-pooling, serve as ngram detectors. We show that filters may capture several different semantic classes of ngrams by using different activation patterns, and that global max-pooling induces behavior which separates important ngrams from the rest. Finally, we show practical use cases derived from our findings in the form of model interpretability (explaining a trained model by deriving a concrete identity for each filter, bridging the gap between visualization tools in vision tasks and NLP) and prediction interpretability (explaining predictions).</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jacovi-sarshalom-goldberg:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5409">
    <title>Linguistic representations in multi-task neural networks for ellipsis resolution</title>
    <author><first>Ola</first><last>Rønning</last></author>
    <author><first>Daniel</first><last>Hardt</last></author>
    <author><first>Anders</first><last>Søgaard</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>66&#8211;73</pages>
    <url>http://www.aclweb.org/anthology/W18-5409</url>
    <abstract>Sluicing resolution is the task of identifying the antecedent to a question ellipsis. Antecedents are often sentential constituents, and previous work has therefore relied on syntactic parsing, together with complex linguistic features. A recent model instead used partial parsing as an auxiliary task in sequential neural network architectures to inject syntactic information. We explore the linguistic information being brought to bear by such networks, both by defining subsets of the data exhibiting relevant linguistic characteristics, and by examining the internal representations of the network. Both perspectives provide evidence for substantial linguistic knowledge being deployed by the neural networks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rnning-hardt-sgaard:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5410">
    <title>Unsupervised Token-wise Alignment to Improve Interpretation of Encoder-Decoder Models</title>
    <author><first>Shun</first><last>Kiyono</last></author>
    <author><first>Sho</first><last>Takase</last></author>
    <author><first>Jun</first><last>Suzuki</last></author>
    <author><first>Naoaki</first><last>Okazaki</last></author>
    <author><first>Kentaro</first><last>Inui</last></author>
    <author><first>Masaaki</first><last>Nagata</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>74&#8211;81</pages>
    <url>http://www.aclweb.org/anthology/W18-5410</url>
    <abstract>Developing a method for understanding the inner workings of black-box neural methods is an important research endeavor.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kiyono-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5411">
    <title>Rule induction for global explanation of trained models</title>
    <author><first>Madhumita</first><last>Sushil</last></author>
    <author><first>Simon</first><last>Suster</last></author>
    <author><first>Walter</first><last>Daelemans</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>82&#8211;97</pages>
    <url>http://www.aclweb.org/anthology/W18-5411</url>
    <abstract>Understanding the behavior of a trained network and finding explanations for its outputs is important for improving the network's performance and generalization ability, and for ensuring trust in automated systems. Several approaches have previously been proposed to identify and visualize the most important features by analyzing a trained network. However, the relations between different features and classes are lost in most cases. We propose a technique to induce sets of if-then-else rules that capture these relations to globally explain the predictions of a network. We first calculate the importance of the features in the trained network. We then weigh the original inputs with these feature importance scores, simplify the transformed input space, and finally fit a rule induction model to explain the model predictions. We find that the output rule-sets can explain the predictions of a neural network trained for 4-class text classification from the 20 newsgroups dataset to a macro-averaged F-score of 0.80. We make the code available at https://github.com/clips/interpret_with_rules.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sushil-suster-daelemans:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5412">
    <title>Can LSTM Learn to Capture Agreement? The Case of Basque</title>
    <author><first>Shauli</first><last>Ravfogel</last></author>
    <author><first>Yoav</first><last>Goldberg</last></author>
    <author><first>Francis</first><last>Tyers</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>98&#8211;107</pages>
    <url>http://www.aclweb.org/anthology/W18-5412</url>
    <abstract>We focus on the task of agreement prediction in Basque, as a case study for a task that requires implicit understanding of sentence structure and the acquisition of a complex but consistent morphological system. In a series of controlled experiments, we probe the ability of sequential models to learn agreement patterns and asses different aspects of the problem. Analyzing experimental results from two syntactic prediction tasks &#8211; verb number prediction and suffix recovery &#8211; we find that sequential models perform worse on agreement prediction in Basque than one might expect on the basis of a previous agreement prediction work in English. Tentative findings based on diagnostic classifiers suggest the network makes use of local heuristics as a proxy for the hierarchical structure of the sentence. We propose the Basque agreement prediction task as challenging benchmark for models that attempt to learn regularities in human language.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>ravfogel-goldberg-tyers:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5413">
    <title>Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks</title>
    <author><first>Joao</first><last>Loula</last></author>
    <author><first>Marco</first><last>Baroni</last></author>
    <author><first>Brenden</first><last>Lake</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>108&#8211;114</pages>
    <url>http://www.aclweb.org/anthology/W18-5413</url>
    <abstract>Systematic compositionality is the ability to recombine meaningful units with regular and predictable outcomes, and it's seen as key to humans' capacity for generalization in language. Recent work (Lake and Baroni, 2018) has studied systematic compositionality in modern seq2seq models using generalization to novel navigation instructions in a grounded environment as a probing tool. Lake and Baroni's main experiment required the models to quickly bootstrap the meaning of new words. We extend this framework here to settings where the model needs only to recombine well-trained functional words (such as "around" and "right") in novel contexts. Our findings confirm and strengthen the earlier ones: seq2seq models can be impressively good at generalizing to novel combinations of previously-seen input, but only when they receive extensive training on the specific pattern to be generalized (e.g., generalizing from many examples of "X around right" to "jump around right"), while failing when generalization requires novel application of compositional rules (e.g., inferring the meaning of "around right" from those of "right" and "around").</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>loula-baroni-lake:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5414">
    <title>Evaluating the Ability of LSTMs to Learn Context-Free Grammars</title>
    <author><first>Luzi</first><last>Sennhauser</last></author>
    <author><first>Robert</first><last>Berwick</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>115&#8211;124</pages>
    <url>http://www.aclweb.org/anthology/W18-5414</url>
    <abstract>While long short-term memory (LSTM) neural net architectures are designed to capture sequence information, human language is generally composed of hierarchical structures. This raises the question as to whether LSTMs can learn hierarchical structures. We explore this question with a well-formed bracket prediction task using two types of brackets modeled by an LSTM.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sennhauser-berwick:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5415">
    <title>Interpretable Neural Architectures for Attributing an Ad’s Performance to its Writing Style</title>
    <author><first>Reid</first><last>Pryzant</last></author>
    <author><first>Sugato</first><last>Basu</last></author>
    <author><first>Kazoo</first><last>Sone</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>125&#8211;135</pages>
    <url>http://www.aclweb.org/anthology/W18-5415</url>
    <abstract>How much does "free shipping!" help an advertisement's ability to persuade?</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>pryzant-basu-sone:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5416">
    <title>Interpreting Neural Networks with Nearest Neighbors</title>
    <author><first>Eric</first><last>Wallace</last></author>
    <author><first>Shi</first><last>Feng</last></author>
    <author><first>Jordan</first><last>Boyd-Graber</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>136&#8211;144</pages>
    <url>http://www.aclweb.org/anthology/W18-5416</url>
    <abstract>Local model interpretation methods explain individual predictions by</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wallace-feng-boydgraber:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5417">
    <title>'Indicatements' that character language models learn English morpho-syntactic units and regularities</title>
    <author><first>Yova</first><last>Kementchedjhieva</last></author>
    <author><first>Adam</first><last>Lopez</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>145&#8211;153</pages>
    <url>http://www.aclweb.org/anthology/W18-5417</url>
    <abstract>Character language models have access to surface morphological patterns, but it is not clear whether or how they learn abstract morphological regularities. We instrument a character language model with several probes, finding that it can develop a specific unit to identify word boundaries and, by extension, morpheme boundaries, which allows it to capture linguistic properties and regularities of these units. Our language model proves surprisingly good at identifying the selectional restrictions of English derivational morphemes, a task that requires both morphological and syntactic awareness. Thus we conclude that, when morphemes overlap extensively with the words of a language, a character language model can perform morphological abstraction.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>kementchedjhieva-lopez:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5418">
    <title>LISA: Explaining Recurrent Neural Network Judgments via Layer-wIse Semantic Accumulation and Example to Pattern Transformation</title>
    <author><first>Pankaj</first><last>Gupta</last></author>
    <author><first>Hinrich</first><last>Sch&#252;tze</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>154&#8211;164</pages>
    <url>http://www.aclweb.org/anthology/W18-5418</url>
    <abstract>Recurrent neural networks (RNNs) are temporal</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>gupta-schtze:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5419">
    <title>Analysing the potential of seq-to-seq models for incremental interpretation in task-oriented dialogue</title>
    <author><first>Dieuwke</first><last>Hupkes</last></author>
    <author><first>Sanne</first><last>Bouwmeester</last></author>
    <author><first>Raquel</first><last>Fern&#225;ndez</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>165&#8211;174</pages>
    <url>http://www.aclweb.org/anthology/W18-5419</url>
    <abstract>We investigate how encoder-decoder models trained on a synthetic dataset of task-oriented dialogues process disfluencies, such as hesitations and self-corrections. We find that, contrary to earlier results, disfluencies have very little impact on the task success of seq-to-seq models with attention. Using visualisations and diagnostic classifiers, we analyse the representations that are incrementally built by the model, and discover that models develop little to no awareness of the structure of disfluencies. However, adding disfluencies to the data appears to help the model create clearer representations overall, as evidenced by the attention patterns the different models exhibit.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hupkes-bouwmeester-fernndez:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5420">
    <title>An Operation Sequence Model for Explainable Neural Machine Translation</title>
    <author><first>Felix</first><last>Stahlberg</last></author>
    <author><first>Danielle</first><last>Saunders</last></author>
    <author><first>Bill</first><last>Byrne</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>175&#8211;186</pages>
    <url>http://www.aclweb.org/anthology/W18-5420</url>
    <abstract>We propose to achieve explainable neural machine translation (NMT) by changing the output representation to explain itself. We present a novel approach to NMT which generates the target sentence by monotonically walking through the source sentence. Word reordering is modeled by operations which allow setting markers in the target sentence and move a target-side write head between those markers. In contrast to many modern neural models, our system emits explicit word alignment information which is often crucial to practical machine translation as it improves explainability. Our technique can outperform a plain text system in terms of BLEU score under the recent Transformer architecture on Japanese-English and Portuguese-English, and is within 0.5 BLEU difference on Spanish-English.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>stahlberg-saunders-byrne:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5421">
    <title>Introspection for convolutional automatic speech recognition</title>
    <author><first>Andreas</first><last>Krug</last></author>
    <author><first>Sebastian</first><last>Stober</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>187&#8211;199</pages>
    <url>http://www.aclweb.org/anthology/W18-5421</url>
    <abstract>Artificial Neural Networks (ANNs) have experienced great success in the past few years. The increasing complexity of these models leads to less understanding about their decision processes. Therefore, introspection techniques have been proposed, mostly for images as input data. Patterns or relevant regions in images can be intuitively interpreted by a human observer. This is not the case for more complex data like speech recordings. In this work, we investigate the application of common introspection techniques from computer vision to an Automatic Speech Recognition (ASR) task. To this end, we use a model similar to image classification, which predicts letters from spectrograms. We show difficulties in applying image introspection to ASR. To tackle these problems, we propose normalized averaging of aligned inputs (NAvAI): a data-driven method to reveal learned patterns for prediction of specific classes. Our method integrates information from many data examples through local introspection techniques for Convolutional Neural Networks (CNNs). We demonstrate that our method provides better interpretability of letter-specific patterns than existing methods.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>krug-stober:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5422">
    <title>Learning and Evaluating Sparse Interpretable Sentence Embeddings</title>
    <author><first>Valentin</first><last>Trifonov</last></author>
    <author><first>Octavian-Eugen</first><last>Ganea</last></author>
    <author><first>Anna</first><last>Potapenko</last></author>
    <author><first>Thomas</first><last>Hofmann</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>200&#8211;210</pages>
    <url>http://www.aclweb.org/anthology/W18-5422</url>
    <abstract>Previous research on word embeddings has shown that sparse representations, which can be either learned on top of existing dense embeddings or obtained through model constraints during training time, have the benefit of increased interpretability properties: to some degree, each dimension can be understood by a human and associated with a recognizable feature in the data. In this paper, we transfer this idea to sentence embeddings and explore several approaches to obtain a sparse representation. We further introduce a novel, quantitative and automated evaluation metric for sentence embedding interpretability, based on topic coherence methods. We observe an increase in interpretability compared to dense models, on a dataset of movie dialogs and on the scene descriptions from the MS COCO dataset.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>trifonov-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5423">
    <title>What do RNN Language Models Learn about Filler&#8211;Gap Dependencies?</title>
    <author><first>Ethan</first><last>Wilcox</last></author>
    <author><first>Roger</first><last>Levy</last></author>
    <author><first>Takashi</first><last>Morita</last></author>
    <author><first>Richard</first><last>Futrell</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>211&#8211;221</pages>
    <url>http://www.aclweb.org/anthology/W18-5423</url>
    <abstract>RNN language models have achieved state-of-the-art perplexity results and have proven useful in a suite of NLP tasks, but it is as yet unclear what syntactic generalizations they learn. Here we investigate whether state-of-the-art RNN language models represent long-distance filler&#8211;gap dependencies and constraints on them. Examining RNN behavior on experimentally controlled sentences designed to expose filler&#8211;gap dependencies, we show that RNNs can represent the relationship in multiple syntactic positions and over large spans of text. Furthermore, we show that RNNs learn a subset of the known restrictions on filler&#8211;gap dependencies, known as island constraints: RNNs show evidence for wh-islands, adjunct islands, and complex NP islands. These studies demonstrates that state-of-the-art RNN models are able to learn and generalize about empty syntactic positions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wilcox-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5424">
    <title>Do Language Models Understand Anything? On the Ability of LSTMs to Understand Negative Polarity Items</title>
    <author><first>Jaap</first><last>Jumelet</last></author>
    <author><first>Dieuwke</first><last>Hupkes</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>222&#8211;231</pages>
    <url>http://www.aclweb.org/anthology/W18-5424</url>
    <abstract>In this paper, we attempt to link the inner workings of a neural language model to linguistic theory, focusing on a complex phenomenon well discussed in formal linguistics: (negative) polarity items.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jumelet-hupkes:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5425">
    <title>Closing Brackets with Recurrent Neural Networks</title>
    <author><first>Natalia</first><last>Skachkova</last></author>
    <author><first>Thomas</first><last>Trost</last></author>
    <author><first>Dietrich</first><last>Klakow</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>232&#8211;239</pages>
    <url>http://www.aclweb.org/anthology/W18-5425</url>
    <abstract>Many natural and formal languages contain words or symbols that require a matching counterpart for making an expression well-formed. The combination of opening and closing brackets is a typical example of such a construction. Due to their commonness, the ability to follow such rules is important for language modeling. Currently, recurrent neural networks (RNNs) are extensively used for this task. We investigate whether they are capable of learning the rules of opening and closing brackets by applying them to synthetic Dyck languages that consist of different types of brackets. We provide an analysis of the statistical properties of these languages as a baseline and show strengths and limits of Elman-RNNs, GRUs and LSTMs in experiments on random samples of these languages. In terms of perplexity and prediction accuracy, the RNNs get close to the theoretical baseline in most cases.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>skachkova-trost-klakow:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5426">
    <title>Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information</title>
    <author><first>Mario</first><last>Giulianelli</last></author>
    <author><first>Jack</first><last>Harding</last></author>
    <author><first>Florian</first><last>Mohnert</last></author>
    <author><first>Dieuwke</first><last>Hupkes</last></author>
    <author><first>Willem</first><last>Zuidema</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>240&#8211;248</pages>
    <url>http://www.aclweb.org/anthology/W18-5426</url>
    <abstract>How do neural language models keep track of number agreement between subject and verb? We show that ‘diagnostic classifiers’, trained to predict number from the internal states of the language model, provide a detailed understanding of how, when, and where this information is represented. Moreover, they give us insight in when and where this information is corrupted in cases where the language model ends up making agreement errors. To demonstrate the causal role that the representations we find play, we then use this information to influence the course of the LSTM during the processing of difficult sentences. Results from such an intervention show a large increase in the language model’s accuracy. Together, these results show that diagnostic classifiers give us an unrivalled detailed look into the representation of linguistic information in neural models, and moreover demonstrate that this knowledge can be use to improve their</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>giulianelli-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5427">
    <title>Iterative Recursive Attention Model for Interpretable Sequence Classification</title>
    <author><first>Martin</first><last>Tutek</last></author>
    <author><first>Jan</first><last>Šnajder</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>249&#8211;257</pages>
    <url>http://www.aclweb.org/anthology/W18-5427</url>
    <abstract>Natural language processing has greatly benefited from the introduction of the attention mechanism. However, standard attention models are of limited interpretability for tasks that involve a series of inference steps. We describe an iterative recursive attention model, which constructs incremental representations of input data through reusing results of previously computed queries. We train our model on sentiment classification datasets and demonstrate its capacity to identify and combine different aspects of the input in an easily interpretable manner, while obtaining performance close to the state of the art.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>tutek-najder:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5428">
    <title>Interpreting Word-Level Hidden State Behaviour of Character-Level LSTM Language Models</title>
    <author><first>Avery</first><last>Hiebert</last></author>
    <author><first>Cole</first><last>Peterson</last></author>
    <author><first>Alona</first><last>Fyshe</last></author>
    <author><first>Nishant</first><last>Mehta</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>258&#8211;266</pages>
    <url>http://www.aclweb.org/anthology/W18-5428</url>
    <abstract>While Long Short-Term Memory networks (LSTMs) and other forms of recurrent neural network have been successfully applied to language modeling on a character level, the hidden state dynamics of these models can be difficult to interpret. We investigate the hidden states of such a model by using the HDBSCAN clustering algorithm to identify points in the text at which the hidden state is similar. Focusing on whitespace characters prior to the beginning of a word reveals interpretable clusters that offer insight into how the LSTM may combine contextual and character-level information to identify parts of speech. We also introduce a method for deriving word vectors from the hidden state representation in order to investigate the word-level knowledge of the model. These word vectors encode meaningful semantic information even for words that appear only once in the training text.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hiebert-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5429">
    <title>Importance of Self-Attention for Sentiment Analysis</title>
    <author><first>Gaël</first><last>Letarte</last></author>
    <author><first>Fr&#233;d&#233;rik</first><last>Paradis</last></author>
    <author><first>Philippe</first><last>Giguère</last></author>
    <author><first>François</first><last>Laviolette</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>267&#8211;275</pages>
    <url>http://www.aclweb.org/anthology/W18-5429</url>
    <abstract>Despite their superior performance, deep learning models often lack interpretability. In this paper, we explore the modeling of insightful relations between words, in order to understand and enhance predictions. To this effect, we propose the Self-Attention Network (SANet), a flexible and interpretable architecture for text classification. Experiments indicate that gains obtained by self-attention is task-dependent. For instance, experiments on sentiment analysis tasks showed an improvement of around 2% when using self-attention compared to a baseline without attention, while topic classification showed no gain. Interpretability brought forward by our architecture highlighted the importance of neighboring word interactions to extract sentiment.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>letarte-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5430">
    <title>Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell</title>
    <author><first>Pia</first><last>Sommerauer</last></author>
    <author><first>Antske</first><last>Fokkens</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>276&#8211;286</pages>
    <url>http://www.aclweb.org/anthology/W18-5430</url>
    <abstract>This paper presents an approach for investigating the nature of semantic information captured by word embeddings. We propose a method that extends an existing human- elicited semantic property dataset with gold negative examples using crowd judgments. Our experimental approach tests the ability of supervised classifiers to identify semantic features in word embedding vectors and compares this to a feature-identification method based on full vector cosine similarity. The idea behind this method is that properties identified by classifiers, but not through full vector comparison are captured by embeddings. Properties that cannot be identified by either method are not. Our results provide an initial indication that semantic properties relevant for the way entities interact (e.g. dangerous) are captured, while perceptual information (e.g. colors) is not represented. We conclude that, though preliminary, these results show that our method is suitable for identifying which properties are captured by embeddings.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>sommerauer-fokkens:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5431">
    <title>An Analysis of Encoder Representations in Transformer-Based Machine Translation</title>
    <author><first>Alessandro</first><last>Raganato</last></author>
    <author><first>Jörg</first><last>Tiedemann</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>287&#8211;297</pages>
    <url>http://www.aclweb.org/anthology/W18-5431</url>
    <abstract>The attention mechanism is a successful technique in modern NLP, especially in tasks like machine translation. The recently proposed network architecture of the Transformer is based entirely on attention mechanisms and achieves new state of the art results in neural machine translation, outperforming other sequence-to-sequence models.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>raganato-tiedemann:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5432">
    <title>Evaluating Grammaticality in Seq2seq Models with a Broad Coverage HPSG Grammar: A Case Study on Machine Translation</title>
    <author><first>Johnny</first><last>Wei</last></author>
    <author><first>Khiem</first><last>Pham</last></author>
    <author><first>Brendan</first><last>O'Connor</last></author>
    <author><first>Brian</first><last>Dillon</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>298&#8211;305</pages>
    <url>http://www.aclweb.org/anthology/W18-5432</url>
    <abstract>Sequence to sequence (seq2seq) models are often employed in settings where the target output is natural language. However, the syntactic properties of the language generated from these models are not well understood. We explore whether such output belongs to a formal and realistic grammar, by employing the English Resource Grammar (ERG), a broad coverage, linguistically precise HPSG-based grammar of English. From a French to English parallel corpus, we analyze the parseability and grammatical constructions occurring in output from a seq2seq translation model. Over 93% of the model translations are parseable, suggesting that it learns to generate conforming to a grammar. The model has trouble learning the distribution of rarer syntactic rules, and we pinpoint several constructions that differentiate translations between the references and our model.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wei-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5433">
    <title>Context-Free Transductions with Neural Stacks</title>
    <author><first>Yiding</first><last>Hao</last></author>
    <author><first>William</first><last>Merrill</last></author>
    <author><first>Dana</first><last>Angluin</last></author>
    <author><first>Robert</first><last>Frank</last></author>
    <author><first>Noah</first><last>Amsel</last></author>
    <author><first>Andrew</first><last>Benz</last></author>
    <author><first>Simon</first><last>Mendelsohn</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>306&#8211;315</pages>
    <url>http://www.aclweb.org/anthology/W18-5433</url>
    <abstract>This paper analyzes the behavior of stack-augmented recurrent neural network (RNN) models. Due to the architectural similarity between stack RNNs and pushdown transducers, we train stack RNN models on a number of tasks, including string reversal, context-free language modelling, and cumulative XOR evaluation. Examining the behavior of our networks, we show that stack-augmented RNNs can discover intuitive stack-based strategies for solving our tasks. However, stack RNNs are more difficult to train than classical architectures such as LSTMs. Rather than employ stack-based strategies, more complex stack-augmented networks often find approximate solutions by using the stack as unstructured memory.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>hao-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5434">
    <title>Learning Explanations from Language Data</title>
    <author><first>David</first><last>Harbecke</last></author>
    <author><first>Robert</first><last>Schwarzenberg</last></author>
    <author><first>Christoph</first><last>Alt</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>316&#8211;318</pages>
    <url>http://www.aclweb.org/anthology/W18-5434</url>
    <abstract>PatternAttribution is a recent method, introduced in the vision domain, that explains classifications of deep neural networks. We demonstrate that it also generates meaningful interpretations in the language domain.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>harbecke-schwarzenberg-alt:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5435">
    <title>How much should you ask? On the question structure in QA systems.</title>
    <author><first>Barbara</first><last>Rychalska</last></author>
    <author><first>Dominika</first><last>Basaj</last></author>
    <author><first>Anna</first><last>Wr&#243;blewska</last></author>
    <author><first>Przemyslaw</first><last>Biecek</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>319&#8211;321</pages>
    <url>http://www.aclweb.org/anthology/W18-5435</url>
    <abstract>Datasets that boosted state-of-the-art solutions for Question Answering (QA) systems prove that it is possible to ask questions in natural language manner. However, users are still used to query-like systems where they type in keywords to search for answer. In this study we validate which parts of questions are essential for obtaining valid answer. In order to conclude that, we take advantage of LIME - a framework that explains prediction by local approximation. We find that grammar and natural language is disregarded by QA. State-of-the-art model can answer properly even if ’asked’ only with a few words with high coefficients calculated with LIME. According to our knowledge, it is the first time that QA model is being explained by LIME.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rychalska-EtAl:2018:BlackboxNLP1</bibkey>
  </paper>

  <paper id="5436">
    <title>Does it care what you asked? Understanding Importance of Verbs in Deep Learning QA System</title>
    <author><first>Barbara</first><last>Rychalska</last></author>
    <author><first>Dominika</first><last>Basaj</last></author>
    <author><first>Anna</first><last>Wr&#243;blewska</last></author>
    <author><first>Przemyslaw</first><last>Biecek</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>322&#8211;324</pages>
    <url>http://www.aclweb.org/anthology/W18-5436</url>
    <abstract>In this paper we present the results of an investigation of the importance of verbs in a deep learning QA system trained on SQuAD dataset. We show that main verbs in questions carry little influence on the decisions made by the system - in over 90% of researched cases swapping verbs for their antonyms did not change system decision. We track this phenomenon down to the insides of the net, analyzing the mechanism of self-attention and values contained in hidden layers of RNN. Finally, we recognize the characteristics of the SQuAD dataset as the source of the problem. Our work refers to the recently popular topic of adversarial examples in NLP, combined with investigating deep net structure.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>rychalska-EtAl:2018:BlackboxNLP2</bibkey>
  </paper>

  <paper id="5437">
    <title>Interpretable Textual Neuron Representations for NLP</title>
    <author><first>Nina</first><last>Poerner</last></author>
    <author><first>Benjamin</first><last>Roth</last></author>
    <author><first>Hinrich</first><last>Sch&#252;tze</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>325&#8211;327</pages>
    <url>http://www.aclweb.org/anthology/W18-5437</url>
    <abstract>Input optimization methods, such as Google Deep Dream, create interpretable representations of neurons for computer vision DNNs. We propose and evaluate ways of transferring this technology to NLP. Our results suggest that gradient ascent with a gumbel softmax layer produces n-gram representations that outperform naive corpus search in terms of target neuron activation. The representations highlight differences in syntax awareness between the language and visual models of the Imaginet architecture.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>poerner-roth-schtze:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5438">
    <title>Language Models Learn POS First</title>
    <author><first>Naomi</first><last>Saphra</last></author>
    <author><first>Adam</first><last>Lopez</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>328&#8211;330</pages>
    <url>http://www.aclweb.org/anthology/W18-5438</url>
    <abstract>A glut of recent research shows that language models capture linguistic structure. Such work answers the question of whether a model represents linguistic structure. But how and when are these structures acquired? Rather than treating the training process itself as a black box, we investigate how representations of linguistic structure are learned over time. In particular, we demonstrate that different aspects of linguistic structure are learned at different rates, with part of speech tagging acquired early and global topic information learned continuously.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>saphra-lopez:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5439">
    <title>Predicting and interpreting embeddings for out of vocabulary words in downstream tasks</title>
    <author><first>Nicolas</first><last>Garneau</last></author>
    <author><first>Jean-Samuel</first><last>Leboeuf</last></author>
    <author><first>Luc</first><last>Lamontagne</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>331&#8211;333</pages>
    <url>http://www.aclweb.org/anthology/W18-5439</url>
    <abstract>We propose a novel way to handle out of vocabulary (OOV) words in downstream natural language processing (NLP) tasks. We implement a network that predicts useful embeddings for OOV words based on their morphology and on the context in which they appear. Our model also incorporates an attention mechanism indicating the focus allocated to the left context words, the right context words or the word’s characters, hence making the prediction more interpretable. The model is a</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>garneau-leboeuf-lamontagne:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5440">
    <title>Probing sentence embeddings for structure-dependent tense</title>
    <author><first>Geoff</first><last>Bacon</last></author>
    <author><first>Terry</first><last>Regier</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>334&#8211;336</pages>
    <url>http://www.aclweb.org/anthology/W18-5440</url>
    <abstract>Learning universal sentence representations which accurately model sentential semantic content is a current goal of natural language processing research. A prominent and successful approach is to train recurrent neural networks (RNNs) to encode sentences into fixed length vectors. Many core linguistic phenomena that one would like to model in universal sentence representations depend on syntactic structure. Despite the fact that RNNs do not have explicit syntactic structural representations, there is some evidence that RNNs can approximate such structure-dependent phenomena under certain conditions, in addition to their widespread success in practical tasks. In this work, we assess RNNs' ability to learn the structure-dependent phenomenon of main clause tense.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>bacon-regier:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5441">
    <title>Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation</title>
    <author><first>Adam</first><last>Poliak</last></author>
    <author><first>Aparajita</first><last>Haldar</last></author>
    <author><first>Rachel</first><last>Rudinger</last></author>
    <author><first>J. Edward</first><last>Hu</last></author>
    <author><first>Ellie</first><last>Pavlick</last></author>
    <author><first>Aaron Steven</first><last>White</last></author>
    <author><first>Benjamin</first><last>Van Durme</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>337&#8211;340</pages>
    <url>http://www.aclweb.org/anthology/W18-5441</url>
    <abstract>We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation encoded by a neural network captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. Our collection of diverse datasets is available at http://www.decomp.net/ and will grow over time as additional resources are recast and added from novel sources.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>poliak-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5442">
    <title>Interpretable Word Embedding Contextualization</title>
    <author><first>Kyoung-Rok</first><last>Jang</last></author>
    <author><first>Sung-Hyon</first><last>Myaeng</last></author>
    <author><first>Sang-Bum</first><last>Kim</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>341&#8211;343</pages>
    <url>http://www.aclweb.org/anthology/W18-5442</url>
    <abstract>In this paper, we propose a method of calibrating a word embedding, so that the semantic it conveys becomes more relevant to the context. Our method is novel because the output shows clearly which senses that were originally presented in a target word embedding become stronger or weaker. This is possible by utilizing the technique introduced in, the technique of using sparse coding to recover senses that comprises a word embedding.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>jang-myaeng-kim:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5443">
    <title>State Gradients for RNN Memory Analysis</title>
    <author><first>Lyan</first><last>Verwimp</last></author>
    <author><first>Hugo</first><last>Van hamme</last></author>
    <author><first>Vincent</first><last>Renkens</last></author>
    <author><first>Patrick</first><last>Wambacq</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>344&#8211;346</pages>
    <url>http://www.aclweb.org/anthology/W18-5443</url>
    <abstract>We present a framework for analyzing what the state in RNNs remembers from its input embeddings. We compute the gradients of the states with respect to the input embeddings and decompose the gradient matrix with Singular Value Decomposition to analyze which directions in the embedding space are best transferred to the hidden state space, characterized by the largest singular values. We apply our approach to LSTM language models and investigate to what extent and for how long certain classes of words are remembered on average for a certain corpus. Additionally, the extent to which a specific property or relationship is remembered by the RNN can be tracked by comparing a vector characterizing that property with the direction(s) in embedding space that are best preserved in hidden state space.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>verwimp-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5444">
    <title>Extracting Syntactic Trees from Transformer Encoder Self-Attentions</title>
    <author><first>David</first><last>Mareček</last></author>
    <author><first>Rudolf</first><last>Rosa</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>347&#8211;349</pages>
    <url>http://www.aclweb.org/anthology/W18-5444</url>
    <abstract>This is a work in progress about extracting the sentence tree structures from the encoder's self-attention weights, when translating into another language using the Transformer neural network architecture. We visualize the structures and discuss their characteristics with respect to the existing syntactic theories and annotations.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>mareek-rosa:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5445">
    <title>Portable, layer-wise task performance monitoring for NLP models</title>
    <author><first>Tom</first><last>Lippincott</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>350&#8211;352</pages>
    <url>http://www.aclweb.org/anthology/W18-5445</url>
    <abstract>There is a long-standing interest in understanding the internal behavior of neural networks. Deep neural architectures for natural language processing (NLP) are often accompanied by explanations for their effectiveness, from general observations (e.g. RNNs can represent unbounded dependencies in a sequence) to specific arguments about linguistic phenomena (early layers encode lexical information, deeper layers syntactic). The recent ascendancy of DNNs is fueling efforts in the NLP community to explore these claims. Previous work has tended to focus on easily-accessible representations like word or sentence embeddings, with deeper structure requiring more ad hoc methods to extract and examine. In this work, we introduce Vivisect, a toolkit that aims at a general solution for broad and fine-grained monitoring in the major DNN frameworks, with minimal change to research patterns.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>lippincott:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5446">
    <title>GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding</title>
    <author><first>Alex</first><last>Wang</last></author>
    <author><first>Amanpreet</first><last>Singh</last></author>
    <author><first>Julian</first><last>Michael</last></author>
    <author><first>Felix</first><last>Hill</last></author>
    <author><first>Omer</first><last>Levy</last></author>
    <author><first>Samuel</first><last>Bowman</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>353&#8211;355</pages>
    <url>http://www.aclweb.org/anthology/W18-5446</url>
    <abstract>For natural language understanding (NLU) technology to be maximally useful, it must be able to process language in a way that is not exclusively tailored to a specific task, genre, or dataset.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>wang-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5447">
    <title>Explicitly modeling case improves neural dependency parsing</title>
    <author><first>Clara</first><last>Vania</last></author>
    <author><first>Adam</first><last>Lopez</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>356&#8211;358</pages>
    <url>http://www.aclweb.org/anthology/W18-5447</url>
    <abstract>Neural dependency parsing models that compose word representations from characters can presumably exploit morphosyntax when making attachment decisions. How much do they know about morphology? We investigate how well they handle morphological case, which is important for parsing. Our experiments on Czech, German and Russian suggest that adding explicit morphological case&#8211;-either oracle or predicted&#8211;-improves neural dependency parsing, indicating that the learned representations in these models do not fully encode the morphological knowledge that they need, and can still benefit from targeted forms of explicit linguistic modeling.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>vania-lopez:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5448">
    <title>Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis</title>
    <author><first>Kelly</first><last>Zhang</last></author>
    <author><first>Samuel</first><last>Bowman</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>359&#8211;361</pages>
    <url>http://www.aclweb.org/anthology/W18-5448</url>
    <abstract>Recently, researchers have found that deep LSTMs trained on tasks like machine translation learn substantial syntactic and semantic information about their input sentences, including part-of-speech. These findings begin to shed light on why pretrained representations, like ELMo and CoVe, are so beneficial for neural language understanding models. We still, though, do not yet have a clear understanding of how the choice of pretraining objective affects the type of linguistic information that models learn. With this in mind, we compare four objectives&#8211;-language modeling, translation, skip-thought, and autoencoding&#8211;-on their ability to induce syntactic and part-of-speech information, holding constant the quantity and genre of the training data, as well as the LSTM architecture.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>zhang-bowman:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5449">
    <title>Representation of Word Meaning in the Intermediate Projection Layer of a Neural Language Model</title>
    <author><first>Steven</first><last>Derby</last></author>
    <author><first>Paul</first><last>Miller</last></author>
    <author><first>Brian</first><last>Murphy</last></author>
    <author><first>Barry</first><last>Devereux</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>362&#8211;364</pages>
    <url>http://www.aclweb.org/anthology/W18-5449</url>
    <abstract>In this work, we evaluate latent semantic knowledge present in the LSTM activation patterns produced before and after the word of interest. We evaluate whether these activations predict human similarity ratings, human-derived property knowledge, and brain imaging data. In this way, we test the model's ability to encode important semantic information relevant to word prediction, and it's relationship with human cognitive semantic representations.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>derby-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5450">
    <title>Interpretable Structure Induction via Sparse Attention</title>
    <author><first>Ben</first><last>Peters</last></author>
    <author><first>Vlad</first><last>Niculae</last></author>
    <author><first>Andr&#233; F. T.</first><last>Martins</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>365&#8211;367</pages>
    <url>http://www.aclweb.org/anthology/W18-5450</url>
    <abstract>Neural network methods are experiencing wide adoption in NLP, thanks to their empirical performance on many tasks. Modern neural architectures go way beyond simple feedforward and recurrent models: they are complex pipelines that perform soft, differentiable computation instead of discrete logic. The price of such soft computing is the introduction of dense dependencies, which make it hard to disentangle the patterns that trigger a prediction. Our recent work on sparse and structured latent computation presents a promising avenue for enhancing interpretability of such neural pipelines. Through this extended abstract, we aim to discuss and explore the potential and impact of our methods.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>peters-niculae-martins:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5451">
    <title>Debugging Sequence-to-Sequence Models with Seq2Seq-Vis</title>
    <author><first>Hendrik</first><last>Strobelt</last></author>
    <author><first>Sebastian</first><last>Gehrmann</last></author>
    <author><first>Michael</first><last>Behrisch</last></author>
    <author><first>Adam</first><last>Perer</last></author>
    <author><first>Hanspeter</first><last>Pfister</last></author>
    <author><first>Alexander</first><last>Rush</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>368&#8211;370</pages>
    <url>http://www.aclweb.org/anthology/W18-5451</url>
    <abstract>Neural sequence-to-sequence models have proven to be</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>strobelt-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5452">
    <title>Grammar Induction with Neural Language Models: An Unusual Replication</title>
    <author><first>Phu Mon</first><last>Htut</last></author>
    <author><first>Kyunghyun</first><last>Cho</last></author>
    <author><first>Samuel</first><last>Bowman</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>371&#8211;373</pages>
    <url>http://www.aclweb.org/anthology/W18-5452</url>
    <abstract>Grammar induction is the task of learning syntactic structure without the expert-labeled treebanks (Charniak and Carroll, 1992; Klein and Manning, 2002). Recent work on latent tree learning offers a new family of approaches to this problem by inducing syntactic structure using the supervision from a downstream NLP task (Yogatama et al., 2017; Maillard et al., 2017; Choi et al., 2018). In a recent paper published at ICLR, Shen et al. (2018) introduce such a model and report near state-of-the-art results on the target task of language modeling, and the first strong latent tree learning result on constituency parsing. During the analysis of this model, we discover issues that make the original results hard to trust, including tuning and even training on what is effectively the test set. Here, we analyze the model under different configurations to understand what it learns and to identify the conditions under which it succeeds. We find that this model represents the first empirical success for neural network latent tree learning, and that neural language modeling warrants further study as a setting for grammar induction.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>htut-cho-bowman:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5453">
    <title>Does Syntactic Knowledge in Multilingual Language Models Transfer Across Languages?</title>
    <author><first>Prajit</first><last>Dhar</last></author>
    <author><first>Arianna</first><last>Bisazza</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>374&#8211;377</pages>
    <url>http://www.aclweb.org/anthology/W18-5453</url>
    <abstract>Recent work has shown that neural models can be successfully trained on multiple languages simultaneously.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>dhar-bisazza:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5454">
    <title>Exploiting Attention to Reveal Shortcomings in Memory Models</title>
    <author><first>Kaylee</first><last>Burns</last></author>
    <author><first>Aida</first><last>Nematzadeh</last></author>
    <author><first>Erin</first><last>Grant</last></author>
    <author><first>Alison</first><last>Gopnik</last></author>
    <author><first>Tom</first><last>Griffiths</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>378&#8211;380</pages>
    <url>http://www.aclweb.org/anthology/W18-5454</url>
    <abstract>The decision making processes of deep networks are difficult to understand and while their accuracy often improves with increased architectural complexity, so too does their opacity. Practical use of machine learning models, especially for question and answering applications, demands a system that is interpretable. We analyze the attention of a memory network model to reconcile contradictory performance on a challenging question-answering dataset that is inspired by theory-of-mind experiments. We equate success on questions to task classification, which explains not only test-time failures but also how well the model generalizes to new training conditions.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>burns-EtAl:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5455">
    <title>End-to-end Image Captioning Exploits Distributional Similarity in Multimodal Space</title>
    <author><first>Pranava Swaroop</first><last>Madhyastha</last></author>
    <author><first>Josiah</first><last>Wang</last></author>
    <author><first>Lucia</first><last>Specia</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>381&#8211;383</pages>
    <url>http://www.aclweb.org/anthology/W18-5455</url>
    <abstract>We hypothesize that end-to-end neural image captioning systems work seemingly well because they exploit and learn `distributional similarity' in a multimodal feature space, by mapping a test image to similar training images in this space and generating a caption from the same space. To validate our hypothesis, we focus on the `image' side of image captioning, and vary the input image representation but keep the RNN text generation model of a CNN-RNN constant. Our analysis indicates that image captioning models (i) are capable of separating structure from noisy input representations; (ii) experience virtually no significant performance loss when a high dimensional representation is compressed to a lower dimensional space; (iii) cluster images with similar visual and linguistic information together. Our experiments all point to one fact: that our distributional similarity hypothesis holds. We conclude that, regardless of the image representation, image captioning systems seem to match images and generate captions in a learned joint image-text semantic subspace.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>madhyastha-wang-specia:2018:BlackboxNLP</bibkey>
  </paper>

  <paper id="5456">
    <title>Limitations in learning an interpreted language with recurrent models</title>
    <author><first>Denis</first><last>Paperno</last></author>
    <booktitle>Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP</booktitle>
    <month>November</month>
    <year>2018</year>
    <address>Brussels, Belgium</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>384&#8211;386</pages>
    <url>http://www.aclweb.org/anthology/W18-5456</url>
    <abstract>In this submission I report work in progress on learning simplified interpreted languages by means of recurrent models. The data is constructed to reflect core properties of natural language as modeled in formal syntax and semantics. Preliminary results suggest that LSTM networks do generalise to compositional interpretation, albeit only in the most favorable learning setting.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>paperno:2018:BlackboxNLP</bibkey>
  </paper>

</volume>

