Shay B. Cohen

Also published as: Shay Cohen


2021

pdf bib
A Differentiable Relaxation of Graph Segmentation and Alignment for AMR Parsing
Chunchuan Lyu | Shay B. Cohen | Ivan Titov
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Abstract Meaning Representations (AMR) are a broad-coverage semantic formalism which represents sentence meaning as a directed acyclic graph. To train most AMR parsers, one needs to segment the graph into subgraphs and align each such subgraph to a word in a sentence; this is normally done at preprocessing, relying on hand-crafted rules. In contrast, we treat both alignment and segmentation as latent variables in our model and induce them as part of end-to-end training. As marginalizing over the structured latent variables is infeasible, we use the variational autoencoding framework. To ensure end-to-end differentiable optimization, we introduce a differentiable relaxation of the segmentation and alignment problems. We observe that inducing segmentation yields substantial gains over using a ‘greedy’ segmentation heuristic. The performance of our method also approaches that of a model that relies on the segmentation rules of Lyu and Titov (2018), which were hand-crafted to handle individual AMR constructions.

pdf bib
A Root of a Problem: Optimizing Single-Root Dependency Parsing
Miloš Stanojević | Shay B. Cohen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We describe two approaches to single-root dependency parsing that yield significant speed ups in such parsing. One approach has been previously used in dependency parsers in practice, but remains undocumented in the parsing literature, and is considered a heuristic. We show that this approach actually finds the optimal dependency tree. The second approach relies on simple reweighting of the inference graph being input to the dependency parser and has an optimal running time. Here, we again show that this approach is fully correct and identifies the highest-scoring parse tree. Our experiments demonstrate a manyfold speed up compared to a previous graph-based state-of-the-art parser without any loss in accuracy or optimality.

pdf bib
A Closer Look into the Robustness of Neural Dependency Parsers Using Better Adversarial Examples
Yuxuan Wang | Wanxiang Che | Ivan Titov | Shay B. Cohen | Zhilin Lei | Ting Liu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Open-Domain Contextual Link Prediction and its Complementarity with Entailment Graphs
Mohammad Javad Hosseini | Shay B. Cohen | Mark Johnson | Mark Steedman
Findings of the Association for Computational Linguistics: EMNLP 2021

An open-domain knowledge graph (KG) has entities as nodes and natural language relations as edges, and is constructed by extracting (subject, relation, object) triples from text. The task of open-domain link prediction is to infer missing relations in the KG. Previous work has used standard link prediction for the task. Since triples are extracted from text, we can ground them in the larger textual context in which they were originally found. However, standard link prediction methods only rely on the KG structure and ignore the textual context that each triple was extracted from. In this paper, we introduce the new task of open-domain contextual link prediction which has access to both the textual context and the KG structure to perform link prediction. We build a dataset for the task and propose a model for it. Our experiments show that context is crucial in predicting missing relations. We also demonstrate the utility of contextual link prediction in discovering context-independent entailments between relations, in the form of entailment graphs (EG), in which the nodes are the relations. The reverse holds too: context-independent EGs assist in predicting relations in context.

pdf bib
Text Generation from Discourse Representation Structures
Jiangming Liu | Shay B. Cohen | Mirella Lapata
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We propose neural models to generate text from formal meaning representations based on Discourse Representation Structures (DRSs). DRSs are document-level representations which encode rich semantic detail pertaining to rhetorical relations, presupposition, and co-reference within and across sentences. We formalize the task of neural DRS-to-text generation and provide modeling solutions for the problems of condition ordering and variable naming which render generation from DRSs non-trivial. Our generator relies on a novel sibling treeLSTM model which is able to accurately represent DRS structures and is more generally suited to trees with wide branches. We achieve competitive performance (59.48 BLEU) on the GMB benchmark against several strong baselines.

pdf bib
Proceedings of the Second Workshop on Domain Adaptation for NLP
Eyal Ben-David | Shay Cohen | Ryan McDonald | Barbara Plank | Roi Reichart | Guy Rotman | Yftah Ziser
Proceedings of the Second Workshop on Domain Adaptation for NLP

2020

pdf bib
Learning Dialog Policies from Weak Demonstrations
Gabriel Gordon-Hall | Philip John Gorinski | Shay B. Cohen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Deep reinforcement learning is a promising approach to training a dialog manager, but current methods struggle with the large state and action spaces of multi-domain dialog systems. Building upon Deep Q-learning from Demonstrations (DQfD), an algorithm that scores highly in difficult Atari games, we leverage dialog data to guide the agent to successfully respond to a user’s requests. We make progressively fewer assumptions about the data needed, using labeled, reduced-labeled, and even unlabeled data to train expert demonstrators. We introduce Reinforced Fine-tune Learning, an extension to DQfD, enabling us to overcome the domain gap between the datasets and the environment. Experiments in a challenging multi-domain dialog system framework validate our approaches, and get high success rates even when trained on out-of-domain data.

pdf bib
Dscorer: A Fast Evaluation Metric for Discourse Representation Structure Parsing
Jiangming Liu | Shay B. Cohen | Mirella Lapata
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Discourse representation structures (DRSs) are scoped semantic representations for texts of arbitrary length. Evaluating the accuracy of predicted DRSs plays a key role in developing semantic parsers and improving their performance. DRSs are typically visualized as boxes which are not straightforward to process automatically. Counter transforms DRSs to clauses and measures clause overlap by searching for variable mappings between two DRSs. However, this metric is computationally costly (with respect to memory and CPU time) and does not scale with longer texts. We introduce Dscorer, an efficient new metric which converts box-style DRSs to graphs and then measures the overlap of n-grams. Experiments show that Dscorer computes accuracy scores that are correlated with Counter at a fraction of the time.

pdf bib
Machine Reading of Historical Events
Or Honovich | Lucas Torroba Hennigen | Omri Abend | Shay B. Cohen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Machine reading is an ambitious goal in NLP that subsumes a wide range of text understanding capabilities. Within this broad framework, we address the task of machine reading the time of historical events, compile datasets for the task, and develop a model for tackling it. Given a brief textual description of an event, we show that good performance can be achieved by extracting relevant sentences from Wikipedia, and applying a combination of task-specific and general-purpose feature embeddings for the classification. Furthermore, we establish a link between the historical event ordering task and the event focus time task from the information retrieval literature, showing they also provide a challenging test case for machine reading algorithms.

pdf bib
English-to-Chinese Transliteration with Phonetic Auxiliary Task
Yuan He | Shay B. Cohen
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Approaching named entities transliteration as a Neural Machine Translation (NMT) problem is common practice. While many have applied various NMT techniques to enhance machine transliteration models, few focus on the linguistic features particular to the relevant languages. In this paper, we investigate the effect of incorporating phonetic features for English-to-Chinese transliteration under the multi-task learning (MTL) setting—where we define a phonetic auxiliary task aimed to improve the generalization performance of the main transliteration task. In addition to our system, we also release a new English-to-Chinese dataset and propose a novel evaluation metric which considers multiple possible transliterations given a source name. Our results show that the multi-task model achieves similar performance as the previous state of the art with a model of a much smaller size.

pdf bib
Obfuscation for Privacy-preserving Syntactic Parsing
Zhifeng Hu | Serhii Havrylov | Ivan Titov | Shay B. Cohen
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

The goal of homomorphic encryption is to encrypt data such that another party can operate on it without being explicitly exposed to the content of the original data. We introduce an idea for a privacy-preserving transformation on natural language data, inspired by homomorphic encryption. Our primary tool is obfuscation, relying on the properties of natural language. Specifically, a given English text is obfuscated using a neural model that aims to preserve the syntactic relationships of the original sentence so that the obfuscated sentence can be parsed instead of the original one. The model works at the word level, and learns to obfuscate each word separately by changing it into a new word that has a similar syntactic role. The text obfuscated by our model leads to better performance on three syntactic parsers (two dependency and one constituency parsers) in comparison to an upper-bound random substitution baseline. More specifically, the results demonstrate that as more terms are obfuscated (by their part of speech), the substitution upper bound significantly degrades, while the neural model maintains a relatively high performing parser. All of this is done without much sacrifice of privacy compared to the random substitution upper bound. We also further analyze the results, and discover that the substituted words have similar syntactic properties, but different semantic content, compared to the original words.

pdf bib
Tensors over Semirings for Latent-Variable Weighted Logic Programs
Esma Balkir | Daniel Gildea | Shay B. Cohen
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

Semiring parsing is an elegant framework for describing parsers by using semiring weighted logic programs. In this paper we present a generalization of this concept: latent-variable semiring parsing. With our framework, any semiring weighted logic program can be latentified by transforming weights from scalar values of a semiring to rank-n arrays, or tensors, of semiring values, allowing the modelling of latent-variable models within the semiring parsing framework. Semiring is too strong a notion when dealing with tensors, and we have to resort to a weaker structure: a partial semiring. We prove that this generalization preserves all the desired properties of the original semiring framework while strictly increasing its expressiveness.

pdf bib
Lightweight, Dynamic Graph Convolutional Networks for AMR-to-Text Generation
Yan Zhang | Zhijiang Guo | Zhiyang Teng | Wei Lu | Shay B. Cohen | Zuozhu Liu | Lidong Bing
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

AMR-to-text generation is used to transduce Abstract Meaning Representation structures (AMR) into text. A key challenge in this task is to efficiently learn effective graph representations. Previously, Graph Convolution Networks (GCNs) were used to encode input AMRs, however, vanilla GCNs are not able to capture non-local information and additionally, they follow a local (first-order) information aggregation scheme. To account for these issues, larger and deeper GCN models are required to capture more complex interactions. In this paper, we introduce a dynamic fusion mechanism, proposing Lightweight Dynamic Graph Convolutional Networks (LDGCNs) that capture richer non-local interactions by synthesizing higher order information from the input graphs. We further develop two novel parameter saving strategies based on the group graph convolutions and weight tied convolutions to reduce memory usage and model complexity. With the help of these strategies, we are able to train a model with fewer parameters while maintaining the model capacity. Experiments demonstrate that LDGCNs outperform state-of-the-art models on two benchmark datasets for AMR-to-text generation with significantly fewer parameters.

pdf bib
Multi-Step Inference for Reasoning Over Paragraphs
Jiangming Liu | Matt Gardner | Shay B. Cohen | Mirella Lapata
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Complex reasoning over text requires understanding and chaining together free-form predicates and logical connectives. Prior work has largely tried to do this either symbolically or with black-box transformers. We present a middle ground between these two extremes: a compositional model reminiscent of neural module networks that can perform chained logical reasoning. This model first finds relevant sentences in the context and then chains them together using neural modules. Our model gives significant performance improvements (up to 29% relative error reduction when combined with a reranker) on ROPES, a recently-introduced complex reasoning dataset.

pdf bib
The Role of Reentrancies in Abstract Meaning Representation Parsing
Ida Szubert | Marco Damonte | Shay B. Cohen | Mark Steedman
Findings of the Association for Computational Linguistics: EMNLP 2020

Abstract Meaning Representation (AMR) parsing aims at converting sentences into AMR representations. These are graphs and not trees because AMR supports reentrancies (nodes with more than one parent). Following previous findings on the importance of reen- trancies for AMR, we empirically find and discuss several linguistic phenomena respon- sible for reentrancies in AMR, some of which have not received attention before. We cate- gorize the types of errors AMR parsers make with respect to reentrancies. Furthermore, we find that correcting these errors provides an in- crease of up to 5% Smatch in parsing perfor- mance and 20% in reentrancy prediction

pdf bib
Reducing Quantity Hallucinations in Abstractive Summarization
Zheng Zhao | Shay B. Cohen | Bonnie Webber
Findings of the Association for Computational Linguistics: EMNLP 2020

It is well-known that abstractive summaries are subject to hallucination—including material that is not supported by the original text. While summaries can be made hallucination-free by limiting them to general phrases, such summaries would fail to be very informative. Alternatively, one can try to avoid hallucinations by verifying that any specific entities in the summary appear in the original text in a similar context. This is the approach taken by our system, Herman. The system learns to recognize and verify quantity entities (dates, numbers, sums of money, etc.) in a beam-worth of abstractive summaries produced by state-of-the-art models, in order to up-rank those summaries whose quantity terms are supported by the original text. Experimental results demonstrate that the ROUGE scores of such up-ranked summaries have a higher Precision than summaries that have not been up-ranked, without a comparable loss in Recall, resulting in higher F1. Preliminary human evaluation of up-ranked vs. original summaries shows people’s preference for the former.

2019

pdf bib
Unlexicalized Transition-based Discontinuous Constituency Parsing
Maximin Coavoux | Benoît Crabbé | Shay B. Cohen
Transactions of the Association for Computational Linguistics, Volume 7

Lexicalized parsing models are based on the assumptions that (i) constituents are organized around a lexical head and (ii) bilexical statistics are crucial to solve ambiguities. In this paper, we introduce an unlexicalized transition-based parser for discontinuous constituency structures, based on a structure-label transition system and a bi-LSTM scoring system. We compare it with lexicalized parsing models in order to address the question of lexicalization in the context of discontinuous constituency parsing. Our experiments show that unlexicalized models systematically achieve higher results than lexicalized models, and provide additional empirical evidence that lexicalization is not necessary to achieve strong parsing results. Our best unlexicalized model sets a new state of the art on English and German discontinuous constituency treebanks. We further provide a per-phenomenon analysis of its errors on discontinuous constituents.

pdf bib
Semantic Role Labeling with Iterative Structure Refinement
Chunchuan Lyu | Shay B. Cohen | Ivan Titov
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Modern state-of-the-art Semantic Role Labeling (SRL) methods rely on expressive sentence encoders (e.g., multi-layer LSTMs) but tend to model only local (if any) interactions between individual argument labeling decisions. This contrasts with earlier work and also with the intuition that the labels of individual arguments are strongly interdependent. We model interactions between argument labeling decisions through iterative refinement. Starting with an output produced by a factorized model, we iteratively refine it using a refinement network. Instead of modeling arbitrary interactions among roles and words, we encode prior knowledge about the SRL problem by designing a restricted network architecture capturing non-local interactions. This modeling choice prevents overfitting and results in an effective model, outperforming strong factorized baseline models on all 7 CoNLL-2009 languages, and achieving state-of-the-art results on 5 of them, including English.

pdf bib
Partners in Crime: Multi-view Sequential Inference for Movie Understanding
Nikos Papasarantopoulos | Lea Frermann | Mirella Lapata | Shay B. Cohen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Multi-view learning algorithms are powerful representation learning tools, often exploited in the context of multimodal problems. However, for problems requiring inference at the token-level of a sequence (that is, a separate prediction must be made for every time step), it is often the case that single-view systems are used, or that more than one views are fused in a simple manner. We describe an incremental neural architecture paired with a novel training objective for incremental inference. The network operates on multi-view data. We demonstrate the effectiveness of our approach on the problem of predicting perpetrators in crime drama series, for which our model significantly outperforms previous work and strong baselines. Moreover, we introduce two tasks, crime case and speaker type tagging, that contribute to movie understanding and demonstrate the effectiveness of our model on them.

pdf bib
Experimenting with Power Divergences for Language Modeling
Matthieu Labeau | Shay B. Cohen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Neural language models are usually trained using Maximum-Likelihood Estimation (MLE). The corresponding objective function for MLE is derived from the Kullback-Leibler (KL) divergence between the empirical probability distribution representing the data and the parametric probability distribution output by the model. However, the word frequency discrepancies in natural language make performance extremely uneven: while the perplexity is usually very low for frequent words, it is especially difficult to predict rare words. In this paper, we experiment with several families (alpha, beta and gamma) of power divergences, generalized from the KL divergence, for learning language models with an objective different than standard MLE. Intuitively, these divergences should affect the way the probability mass is spread during learning, notably by prioritizing performances on high or low-frequency words. In addition, we implement and experiment with various sampling-based objectives, where the computation of the output layer is only done on a small subset of the vocabulary. They are derived as power generalizations of a softmax approximated via Importance Sampling, and Noise Contrastive Estimation, for accelerated learning. Our experiments on the Penn Treebank and Wikitext-2 show that these power divergences can indeed be used to prioritize learning on the frequent or rare words, and lead to general performance improvements in the case of sampling-based learning.

pdf bib
Wide-Coverage Neural A* Parsing for Minimalist Grammars
John Torr | Miloš Stanojević | Mark Steedman | Shay B. Cohen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Minimalist Grammars (Stabler, 1997) are a computationally oriented, and rigorous formalisation of many aspects of Chomsky’s (1995) Minimalist Program. This paper presents the first ever application of this formalism to the task of realistic wide-coverage parsing. The parser uses a linguistically expressive yet highly constrained grammar, together with an adaptation of the A* search algorithm currently used in CCG parsing (Lewis and Steedman, 2014; Lewis et al., 2016), with supertag probabilities provided by a bi-LSTM neural network supertagger trained on MGbank, a corpus of MG derivation trees. We report on some promising initial experimental results for overall dependency recovery as well as on the recovery of certain unbounded long distance dependencies. Finally, although like other MG parsers, ours has a high order polynomial worst case time complexity, we show that in practice its expected time complexity is cubic in the length of the sentence. The parser is publicly available.

pdf bib
Duality of Link Prediction and Entailment Graph Induction
Mohammad Javad Hosseini | Shay B. Cohen | Mark Johnson | Mark Steedman
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Link prediction and entailment graph induction are often treated as different problems. In this paper, we show that these two problems are actually complementary. We train a link prediction model on a knowledge graph of assertions extracted from raw text. We propose an entailment score that exploits the new facts discovered by the link prediction model, and then form entailment graphs between relations. We further use the learned entailments to predict improved link prediction scores. Our results show that the two tasks can benefit from each other. The new entailment score outperforms prior state-of-the-art results on a standard entialment dataset and the new link prediction scores show improvements over the raw link prediction scores.

pdf bib
Discourse Representation Parsing for Sentences and Documents
Jiangming Liu | Shay B. Cohen | Mirella Lapata
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We introduce a novel semantic parsing task based on Discourse Representation Theory (DRT; Kamp and Reyle 1993). Our model operates over Discourse Representation Tree Structures which we formally define for sentences and documents. We present a general framework for parsing discourse structures of arbitrary length and granularity. We achieve this with a neural model equipped with a supervised hierarchical attention mechanism and a linguistically-motivated copy strategy. Experimental results on sentence- and document-level benchmarks show that our model outperforms competitive baselines by a wide margin.

pdf bib
Discourse Representation Structure Parsing with Recurrent Neural Networks and the Transformer Model
Jiangming Liu | Shay B. Cohen | Mirella Lapata
Proceedings of the IWCS Shared Task on Semantic Parsing

We describe the systems we developed for Discourse Representation Structure (DRS) parsing as part of the IWCS-2019 Shared Task of DRS Parsing.1 Our systems are based on sequence-to-sequence modeling. To implement our model, we use the open-source neural machine translation system implemented in PyTorch, OpenNMT-py. We experimented with a variety of encoder-decoder models based on recurrent neural networks and the Transformer model. We conduct experiments on the standard benchmark of the Parallel Meaning Bank (PMB 2.2). Our best system achieves a score of 84.8% F1 in the DRS parsing shared task.

pdf bib
Bottom-Up Unranked Tree-to-Graph Transducers for Translation into Semantic Graphs
Johanna Björklund | Shay B. Cohen | Frank Drewes | Giorgio Satta
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing

We propose a formal model for translating unranked syntactic trees, such as dependency trees, into semantic graphs. These tree-to-graph transducers can serve as a formal basis of transition systems for semantic parsing which recently have been shown to perform very well, yet hitherto lack formalization. Our model features “extended” rules and an arc-factored normal form, comes with an efficient translation algorithm, and can be equipped with weights in a straightforward manner.

pdf bib
Discontinuous Constituency Parsing with a Stack-Free Transition System and a Dynamic Oracle
Maximin Coavoux | Shay B. Cohen
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce a novel transition system for discontinuous constituency parsing. Instead of storing subtrees in a stack –i.e. a data structure with linear-time sequential access– the proposed system uses a set of parsing items, with constant-time random access. This change makes it possible to construct any discontinuous constituency tree in exactly 4n–2 transitions for a sentence of length n. At each parsing step, the parser considers every item in the set to be combined with a focus item and to construct a new constituent in a bottom-up fashion. The parsing strategy is based on the assumption that most syntactic structures can be parsed incrementally and that the set –the memory of the parser– remains reasonably small on average. Moreover, we introduce a provably correct dynamic oracle for the new transition system, and present the first experiments in discontinuous constituency parsing using a dynamic oracle. Our parser obtains state-of-the-art results on three English and German discontinuous treebanks.

pdf bib
Structural Neural Encoders for AMR-to-text Generation
Marco Damonte | Shay B. Cohen
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

AMR-to-text generation is a problem recently introduced to the NLP community, in which the goal is to generate sentences from Abstract Meaning Representation (AMR) graphs. Sequence-to-sequence models can be used to this end by converting the AMR graphs to strings. Approaching the problem while working directly with graphs requires the use of graph-to-sequence models that encode the AMR graph into a vector representation. Such encoding has been shown to be beneficial in the past, and unlike sequential encoding, it allows us to explicitly capture reentrant structures in the AMR graphs. We investigate the extent to which reentrancies (nodes with multiple parents) have an impact on AMR-to-text generation by comparing graph encoders to tree encoders, where reentrancies are not preserved. We show that improvements in the treatment of reentrancies and long-range dependencies contribute to higher overall scores for graph encoders. Our best model achieves 24.40 BLEU on LDC2015E86, outperforming the state of the art by 1.1 points and 24.54 BLEU on LDC2017T10, outperforming the state of the art by 1.24 points.

pdf bib
Jointly Extracting and Compressing Documents with Summary State Representations
Afonso Mendes | Shashi Narayan | Sebastião Miranda | Zita Marinho | André F. T. Martins | Shay B. Cohen
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We present a new neural model for text summarization that first extracts sentences from a document and then compresses them. The pro-posed model offers a balance that sidesteps thedifficulties in abstractive methods while gener-ating more concise summaries than extractivemethods. In addition, our model dynamically determines the length of the output summary based on the gold summaries it observes during training and does not require length constraints typical to extractive summarization. The model achieves state-of-the-art results on the CNN/DailyMail and Newsroom datasets, improving over current extractive and abstractive methods. Human evaluations demonstratethat our model generates concise and informa-tive summaries. We also make available a new dataset of oracle compressive summaries derived automatically from the CNN/DailyMailreference summaries.

2018

pdf bib
Privacy-preserving Neural Representations of Text
Maximin Coavoux | Shashi Narayan | Shay B. Cohen
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This article deals with adversarial attacks towards deep learning systems for Natural Language Processing (NLP), in the context of privacy protection. We study a specific type of attack: an attacker eavesdrops on the hidden representations of a neural text classifier and tries to recover information about the input text. Such scenario may arise in situations when the computation of a neural network is shared across multiple devices, e.g. some hidden representation is computed by a user’s device and sent to a cloud-based model. We measure the privacy of a hidden representation by the ability of an attacker to predict accurately specific private information from it and characterize the tradeoff between the privacy and the utility of neural representations. Finally, we propose several defense methods based on modified training objectives and show that they improve the privacy of neural representations.

pdf bib
Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
Shashi Narayan | Shay B. Cohen | Mirella Lapata
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We introduce “extreme summarization”, a new single-document summarization task which does not favor extractive strategies and calls for an abstractive modeling approach. The idea is to create a short, one-sentence news summary answering the question “What is the article about?”. We collect a real-world, large-scale dataset for this task by harvesting online articles from the British Broadcasting Corporation (BBC). We propose a novel abstractive model which is conditioned on the article’s topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures long-range dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans.

pdf bib
Multilingual Clustering of Streaming News
Sebastião Miranda | Artūrs Znotiņš | Shay B. Cohen | Guntis Barzdins
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual clusters. Unlike typical clustering approaches that report results on datasets with a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. In our formulation, the monolingual clusters group together documents while the crosslingual clusters group together monolingual clusters, one per language that appears in the stream. Our method is simple to implement, computationally efficient and produces state-of-the-art results on datasets in German, English and Spanish.

pdf bib
Discourse Representation Structure Parsing
Jiangming Liu | Shay B. Cohen | Mirella Lapata
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce an open-domain neural semantic parser which generates formal meaning representations in the style of Discourse Representation Theory (DRT; Kamp and Reyle 1993). We propose a method which transforms Discourse Representation Structures (DRSs) to trees and develop a structure-aware model which decomposes the decoding process into three stages: basic DRS structure prediction, condition prediction (i.e., predicates and relations), and referent prediction (i.e., variables). Experimental results on the Groningen Meaning Bank (GMB) show that our model outperforms competitive baselines by a wide margin.

pdf bib
Stock Movement Prediction from Tweets and Historical Prices
Yumo Xu | Shay B. Cohen
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Stock movement prediction is a challenging problem: the market is highly stochastic, and we make temporally-dependent predictions from chaotic data. We treat these three complexities and present a novel deep generative model jointly exploiting text and price signals for this task. Unlike the case with discriminative or topic modeling, our model introduces recurrent, continuous latent variables for a better treatment of stochasticity, and uses neural variational inference to address the intractable posterior inference. We also provide a hybrid objective with temporal auxiliary to flexibly capture predictive dependencies. We demonstrate the state-of-the-art performance of our proposed model on a new stock movement prediction dataset which we collected.

pdf bib
Document Modeling with External Attention for Sentence Extraction
Shashi Narayan | Ronald Cardenas | Nikos Papasarantopoulos | Shay B. Cohen | Mirella Lapata | Jiangsheng Yu | Yi Chang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Document modeling is essential to a variety of natural language understanding tasks. We propose to use external information to improve document modeling for problems that can be framed as sentence extraction. We develop a framework composed of a hierarchical document encoder and an attention-based extractor with attention over external information. We evaluate our model on extractive document summarization (where the external information is image captions and the title of the document) and answer selection (where the external information is a question). We show that our model consistently outperforms strong baselines, in terms of both informativeness and fluency (for CNN document summarization) and achieves state-of-the-art results for answer selection on WikiQA and NewsQA.

pdf bib
Whodunnit? Crime Drama as a Case for Natural Language Understanding
Lea Frermann | Shay B. Cohen | Mirella Lapata
Transactions of the Association for Computational Linguistics, Volume 6

In this paper we argue that crime drama exemplified in television programs such as CSI: Crime Scene Investigation is an ideal testbed for approximating real-world natural language understanding and the complex inferences associated with it. We propose to treat crime drama as a new inference task, capitalizing on the fact that each episode poses the same basic question (i.e., who committed the crime) and naturally provides the answer when the perpetrator is revealed. We develop a new dataset based on CSI episodes, formalize perpetrator identification as a sequence labeling problem, and develop an LSTM-based model which learns from multi-modal data. Experimental results show that an incremental inference strategy is key to making accurate guesses as well as learning from representations fusing textual, visual, and acoustic input.

pdf bib
Learning Typed Entailment Graphs with Global Soft Constraints
Mohammad Javad Hosseini | Nathanael Chambers | Siva Reddy | Xavier R. Holt | Shay B. Cohen | Mark Johnson | Mark Steedman
Transactions of the Association for Computational Linguistics, Volume 6

This paper presents a new method for learning typed entailment graphs from text. We extract predicate-argument structures from multiple-source news corpora, and compute local distributional similarity scores to learn entailments between predicates with typed arguments (e.g., person contracted disease). Previous work has used transitivity constraints to improve local decisions, but these constraints are intractable on large graphs. We instead propose a scalable method that learns globally consistent similarity scores based on new soft constraints that consider both the structures across typed entailment graphs and inside each graph. Learning takes only a few hours to run over 100K predicates and our results show large improvements over local similarity scores on two entailment data sets. We further show improvements over paraphrases and entailments from the Paraphrase Database, and prior state-of-the-art entailment graphs. We show that the entailment graphs improve performance in a downstream task.

pdf bib
Local String Transduction as Sequence Labeling
Joana Ribeiro | Shashi Narayan | Shay B. Cohen | Xavier Carreras
Proceedings of the 27th International Conference on Computational Linguistics

We show that the general problem of string transduction can be reduced to the problem of sequence labeling. While character deletion and insertions are allowed in string transduction, they do not exist in sequence labeling. We show how to overcome this difference. Our approach can be used with any sequence labeling algorithm and it works best for problems in which string transduction imposes a strong notion of locality (no long range dependencies). We experiment with spelling correction for social media, OCR correction, and morphological inflection, and we see that it behaves better than seq2seq models and yields state-of-the-art results in several cases.

pdf bib
Abstract Meaning Representation for Paraphrase Detection
Fuad Issa | Marco Damonte | Shay B. Cohen | Xiaohui Yan | Yi Chang
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Abstract Meaning Representation (AMR) parsing aims at abstracting away from the syntactic realization of a sentence, and denote only its meaning in a canonical form. As such, it is ideal for paraphrase detection, a problem in which one is required to specify whether two sentences have the same meaning. We show that naïve use of AMR in paraphrase detection is not necessarily useful, and turn to describe a technique based on latent semantic analysis in combination with AMR parsing that significantly advances state-of-the-art results in paraphrase detection for the Microsoft Research Paraphrase Corpus. Our best results in the transductive setting are 86.6% for accuracy and 90.0% for F1 measure.

pdf bib
Cross-Lingual Abstract Meaning Representation Parsing
Marco Damonte | Shay B. Cohen
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Abstract Meaning Representation (AMR) research has mostly focused on English. We show that it is possible to use AMR annotations for English as a semantic representation for sentences written in other languages. We exploit an AMR parser for English and parallel corpora to learn AMR parsers for Italian, Spanish, German and Chinese. Qualitative analysis show that the new parsers overcome structural differences between the languages. We further propose a method to evaluate the parsers that does not require gold standard data in the target languages. This method highly correlates with the gold standard evaluation, obtaining a Pearson correlation coefficient of 0.95.

pdf bib
Ranking Sentences for Extractive Summarization with Reinforcement Learning
Shashi Narayan | Shay B. Cohen | Mirella Lapata
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Single document summarization is the task of producing a shorter version of a document while preserving its principal information content. In this paper we conceptualize extractive summarization as a sentence ranking task and propose a novel training algorithm which globally optimizes the ROUGE evaluation metric through a reinforcement learning objective. We use our algorithm to train a neural summarization model on the CNN and DailyMail datasets and demonstrate experimentally that it outperforms state-of-the-art extractive and abstractive systems when evaluated automatically and by humans.

2017

pdf bib
Split and Rephrase
Shashi Narayan | Claire Gardent | Shay B. Cohen | Anastasia Shimorina
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a new sentence simplification task (Split-and-Rephrase) where the aim is to split a complex sentence into a meaning preserving sequence of shorter sentences. Like sentence simplification, splitting-and-rephrasing has the potential of benefiting both natural language processing and societal applications. Because shorter sentences are generally better processed by NLP systems, it could be used as a preprocessing step which facilitates and improves the performance of parsers, semantic role labellers and machine translation systems. It should also be of use for people with reading disabilities because it allows the conversion of longer sentences into shorter ones. This paper makes two contributions towards this new task. First, we create and make available a benchmark consisting of 1,066,115 tuples mapping a single complex sentence to a sequence of sentences expressing the same meaning. Second, we propose five models (vanilla sequence-to-sequence to semantically-motivated models) to understand the difficulty of the proposed task.

pdf bib
Proceedings of the 2nd Workshop on Representation Learning for NLP
Phil Blunsom | Antoine Bordes | Kyunghyun Cho | Shay Cohen | Chris Dyer | Edward Grefenstette | Karl Moritz Hermann | Laura Rimell | Jason Weston | Scott Yih
Proceedings of the 2nd Workshop on Representation Learning for NLP

pdf bib
Latent-Variable PCFGs: Background and Applications
Shay Cohen
Proceedings of the 15th Meeting on the Mathematics of Language

pdf bib
An Incremental Parser for Abstract Meaning Representation
Marco Damonte | Shay B. Cohen | Giorgio Satta
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Abstract Meaning Representation (AMR) is a semantic representation for natural language that embeds annotations related to traditional tasks such as named entity recognition, semantic role labeling, word sense disambiguation and co-reference resolution. We describe a transition-based parser for AMR that parses sentences left-to-right, in linear time. We further propose a test-suite that assesses specific subtasks that are helpful in comparing AMR parsers, and show that our parser is competitive with the state of the art on the LDC2015E86 dataset and that it outperforms state-of-the-art parsers for recovering named entities and handling polarity.

pdf bib
The SUMMA Platform Prototype
Renars Liepins | Ulrich Germann | Guntis Barzdins | Alexandra Birch | Steve Renals | Susanne Weber | Peggy van der Kreeft | Hervé Bourlard | João Prieto | Ondřej Klejch | Peter Bell | Alexandros Lazaridis | Alfonso Mendes | Sebastian Riedel | Mariana S. C. Almeida | Pedro Balage | Shay B. Cohen | Tomasz Dwojak | Philip N. Garner | Andreas Giefer | Marcin Junczys-Dowmunt | Hina Imran | David Nogueira | Ahmed Ali | Sebastião Miranda | Andrei Popescu-Belis | Lesly Miculicich Werlen | Nikos Papasarantopoulos | Abiola Obamuyide | Clive Jones | Fahim Dalvi | Andreas Vlachos | Yang Wang | Sibo Tong | Rico Sennrich | Nikolaos Pappas | Shashi Narayan | Marco Damonte | Nadir Durrani | Sameer Khurana | Ahmed Abdelali | Hassan Sajjad | Stephan Vogel | David Sheppey | Chris Hernon | Jeff Mitchell
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.

2016

pdf bib
Proceedings of the 1st Workshop on Representation Learning for NLP
Phil Blunsom | Kyunghyun Cho | Shay Cohen | Edward Grefenstette | Karl Moritz Hermann | Laura Rimell | Jason Weston | Scott Wen-tau Yih
Proceedings of the 1st Workshop on Representation Learning for NLP

pdf bib
Paraphrase Generation from Latent-Variable PCFGs for Semantic Parsing
Shashi Narayan | Siva Reddy | Shay B. Cohen
Proceedings of the 9th International Natural Language Generation conference

pdf bib
Semi-Supervised Learning of Sequence Models with Method of Moments
Zita Marinho | André F. T. Martins | Shay B. Cohen | Noah A. Smith
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Encoding Prior Knowledge with Eigenword Embeddings
Dominique Osborne | Shashi Narayan | Shay B. Cohen
Transactions of the Association for Computational Linguistics, Volume 4

Canonical correlation analysis (CCA) is a method for reducing the dimension of data represented using two views. It has been previously used to derive word embeddings, where one view indicates a word, and the other view indicates its context. We describe a way to incorporate prior knowledge into CCA, give a theoretical justification for it, and test it by deriving word embeddings and evaluating them on a myriad of datasets.

pdf bib
Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication
Shay B. Cohen | Daniel Gildea
Computational Linguistics, Volume 42, Issue 3 - September 2016

pdf bib
Optimizing Spectral Learning for Parsing
Shashi Narayan | Shay B. Cohen
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
A Coactive Learning View of Online Structured Prediction in Statistical Machine Translation
Artem Sokolov | Stefan Riezler | Shay B. Cohen
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Conversation Trees: A Grammar Model for Topic Structure in Forums
Annie Louis | Shay B. Cohen
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Diversity in Spectral Learning for Natural Language Parsing
Shashi Narayan | Shay B. Cohen
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Lexical Event Ordering with an Edge-Factored Model
Omri Abend | Shay B. Cohen | Mark Steedman
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing
Phil Blunsom | Shay Cohen | Paramveer Dhillon | Percy Liang
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

2014

pdf bib
Latent-Variable Synchronous CFGs for Hierarchical Translation
Avneesh Saluja | Chris Dyer | Shay B. Cohen
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Online Adaptor Grammars with Hybrid Inference
Ke Zhai | Jordan Boyd-Graber | Shay B. Cohen
Transactions of the Association for Computational Linguistics, Volume 2

Adaptor grammars are a flexible, powerful formalism for defining nonparametric, unsupervised models of grammar productions. This flexibility comes at the cost of expensive inference. We address the difficulty of inference through an online algorithm which uses a hybrid of Markov chain Monte Carlo and variational inference. We show that this inference strategy improves scalability without sacrificing performance on unsupervised word segmentation and topic modeling tasks.

pdf bib
Lexical Inference over Multi-Word Predicates: A Distributional Approach
Omri Abend | Shay B. Cohen | Mark Steedman
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Provably Correct Learning Algorithm for Latent-Variable PCFGs
Shay B. Cohen | Michael Collins
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Spectral Unsupervised Parsing with Additive Tree Metrics
Ankur P. Parikh | Shay B. Cohen | Eric P. Xing
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
The effect of non-tightness on Bayesian estimation of PCFGs
Shay B. Cohen | Mark Johnson
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Experiments with Spectral Learning of Latent-Variable PCFGs
Shay B. Cohen | Karl Stratos | Michael Collins | Dean P. Foster | Lyle Ungar
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Approximate PCFG Parsing Using Tensor Decomposition
Shay B. Cohen | Giorgio Satta | Michael Collins
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Spectral Learning Algorithms for Natural Language Processing
Shay Cohen | Michael Collins | Dean Foster | Karl Stratos | Lyle Ungar
NAACL HLT 2013 Tutorial Abstracts

pdf bib
Spectral Learning of Refinement HMMs
Karl Stratos | Alexander Rush | Shay B. Cohen | Michael Collins
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

2012

pdf bib
Spectral Learning of Latent-Variable PCFGs
Shay B. Cohen | Karl Stratos | Michael Collins | Dean P. Foster | Lyle Ungar
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning
Shay B. Cohen | Noah A. Smith
Computational Linguistics, Volume 38, Issue 3 - September 2012

2011

pdf bib
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
Shay B. Cohen | Dipanjan Das | Noah A. Smith
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
Shay B. Cohen | Carlos Gómez-Rodríguez | Giorgio Satta
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Unsupervised Bilingual POS Tagging with Markov Random Fields
Desai Chen | Chris Dyer | Shay Cohen | Noah Smith
Proceedings of the First workshop on Unsupervised Learning in NLP

2010

pdf bib
Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
Shay Cohen | Noah A. Smith
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Variational Inference for Adaptor Grammars
Shay B. Cohen | David M. Blei | Noah A. Smith
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Social Links from Latent Topics in Microblogs
Kriti Puniyani | Jacob Eisenstein | Shay B. Cohen | Eric Xing
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media

2009

pdf bib
Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction
Shay Cohen | Noah A. Smith
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Variational Inference for Grammar Induction with Prior Knowledge
Shay Cohen | Noah A. Smith
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2007

pdf bib
Joint Morphological and Syntactic Disambiguation
Shay B. Cohen | Noah A. Smith
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

Search
Co-authors