Omer Levy


2021

pdf bib
Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language
Avia Efrat | Uri Shaham | Dan Kilman | Omer Levy
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Current NLP datasets targeting ambiguity can be solved by a native speaker with relative ease. We present Cryptonite, a large-scale dataset based on cryptic crosswords, which is both linguistically complex and naturally sourced. Each example in Cryptonite is a cryptic clue, a short phrase or sentence with a misleading surface reading, whose solving requires disambiguating semantic, syntactic, and phonetic wordplays, as well as world knowledge. Cryptic clues pose a challenge even for experienced solvers, though top-tier experts can solve them with almost 100% accuracy. Cryptonite is a challenging task for current models; fine-tuning T5-Large on 470k cryptic clues achieves only 7.6% accuracy, on par with the accuracy of a rule-based clue solver (8.6%).

pdf bib
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva | Roei Schuster | Jonathan Berant | Omer Levy
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Feed-forward layers constitute two-thirds of a transformer model’s parameters, yet their role in the network remains under-explored. We show that feed-forward layers in transformer-based language models operate as key-value memories, where each key correlates with textual patterns in the training examples, and each value induces a distribution over the output vocabulary. Our experiments show that the learned patterns are human-interpretable, and that lower layers tend to capture shallow patterns, while upper layers learn more semantic ones. The values complement the keys’ input patterns by inducing output distributions that concentrate probability mass on tokens likely to appear immediately after each pattern, particularly in the upper layers. Finally, we demonstrate that the output of a feed-forward layer is a composition of its memories, which is subsequently refined throughout the model’s layers via residual connections to produce the final output distribution.

pdf bib
How to Train BERT with an Academic Budget
Peter Izsak | Moshe Berchansky | Omer Levy
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

While large language models a la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe for pretraining a masked language model in 24 hours using a single low-end deep learning server. We demonstrate that through a combination of software optimizations, design choices, and hyperparameter tuning, it is possible to produce models that are competitive with BERT-base on GLUE tasks at a fraction of the original pretraining cost.

pdf bib
Neural Machine Translation without Embeddings
Uri Shaham | Omer Levy
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Many NLP models operate over sequences of subword tokens produced by hand-crafted tokenization rules and heuristic subword induction algorithms. A simple universal alternative is to represent every computerized text as a sequence of bytes via UTF-8, obviating the need for an embedding layer since there are fewer token types (256) than dimensions. Surprisingly, replacing the ubiquitous embedding layer with one-hot representations of each byte does not hurt performance; experiments on byte-to-byte machine translation from English to 10 different languages show a consistent improvement in BLEU, rivaling character-level and even standard subword-level models. A deeper investigation reveals that the combination of embeddingless models with decoder-input dropout amounts to token dropout, which benefits byte-to-byte models in particular.

pdf bib
Can Latent Alignments Improve Autoregressive Machine Translation?
Adi Haviv | Lior Vassertail | Omer Levy
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Latent alignment objectives such as CTC and AXE significantly improve non-autoregressive machine translation models. Can they improve autoregressive models as well? We explore the possibility of training autoregressive machine translation models with latent alignment objectives, and observe that, in practice, this approach results in degenerate models. We provide a theoretical explanation for these empirical results, and prove that latent alignment objectives are incompatible with teacher forcing.

pdf bib
Few-Shot Question Answering by Pretraining Span Selection
Ori Ram | Yuval Kirstain | Jonathan Berant | Amir Globerson | Omer Levy
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In several question answering benchmarks, pretrained models have reached human parity through fine-tuning on an order of 100,000 annotated questions and answers. We explore the more realistic few-shot setting, where only a few hundred training examples are available, and observe that standard models perform poorly, highlighting the discrepancy between current pretraining objectives and question answering. We propose a new pretraining scheme tailored for question answering: recurring span selection. Given a passage with multiple sets of recurring spans, we mask in each set all recurring spans but one, and ask the model to select the correct span in the passage for each masked span. Masked spans are replaced with a special token, viewed as a question representation, that is later used during fine-tuning to select the answer span. The resulting model obtains surprisingly good results on multiple benchmarks (e.g., 72.7 F1 on SQuAD with only 128 training examples), while maintaining competitive performance in the high-resource setting.

pdf bib
Coreference Resolution without Span Representations
Yuval Kirstain | Ori Ram | Omer Levy
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

The introduction of pretrained language models has reduced many complex task-specific NLP models to simple lightweight layers. An exception to this trend is coreference resolution, where a sophisticated task-specific model is appended to a pretrained transformer encoder. While highly effective, the model has a very large memory footprint – primarily due to dynamically-constructed span and span-pair representations – which hinders the processing of complete documents and the ability to train on multiple instances in a single batch. We introduce a lightweight end-to-end coreference model that removes the dependency on span representations, handcrafted features, and heuristics. Our model performs competitively with the current standard model, while being simpler and more efficient.

pdf bib
ParaShoot: A Hebrew Question Answering Dataset
Omri Keren | Omer Levy
Proceedings of the 3rd Workshop on Machine Reading for Question Answering

NLP research in Hebrew has largely focused on morphology and syntax, where rich annotated datasets in the spirit of Universal Dependencies are available. Semantic datasets, however, are in short supply, hindering crucial advances in the development of NLP technology in Hebrew. In this work, we present ParaShoot, the first question answering dataset in modern Hebrew. The dataset follows the format and crowdsourcing methodology of SQuAD, and contains approximately 3000 annotated examples, similar to other question-answering datasets in low-resource languages. We provide the first baseline results using recently-released BERT-style models for Hebrew, showing that there is significant room for improvement on this task.

2020

pdf bib
SpanBERT: Improving Pre-training by Representing and Predicting Spans
Mandar Joshi | Danqi Chen | Yinhan Liu | Daniel S. Weld | Luke Zettlemoyer | Omer Levy
Transactions of the Association for Computational Linguistics, Volume 8

We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERTlarge, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even gains on GLUE.1

pdf bib
Improving Transformer Models by Reordering their Sublayers
Ofir Press | Noah A. Smith | Omer Levy
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Multilayer transformer networks consist of interleaved self-attention and feedforward sublayers. Could ordering the sublayers in a different pattern lead to better performance? We generate randomly ordered transformers and train them with the language modeling objective. We observe that some of these models are able to achieve better performance than the interleaved baseline, and that those successful variants tend to have more self-attention at the bottom and more feedforward sublayers at the top. We propose a new transformer pattern that adheres to this property, the sandwich transformer, and show that it improves perplexity on multiple word-level and character-level language modeling benchmarks, at no cost in parameters, memory, or training time. However, the sandwich reordering pattern does not guarantee performance gains across every task, as we demonstrate on machine translation models. Instead, we suggest that further exploration of task-specific sublayer reorderings is needed in order to unlock additional gains.

pdf bib
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Mike Lewis | Yinhan Liu | Naman Goyal | Marjan Ghazvininejad | Abdelrahman Mohamed | Omer Levy | Veselin Stoyanov | Luke Zettlemoyer
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and other recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 3.5 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also replicate other pretraining schemes within the BART framework, to understand their effect on end-task performance.

pdf bib
Blockwise Self-Attention for Long Document Understanding
Jiezhong Qiu | Hao Ma | Omer Levy | Wen-tau Yih | Sinong Wang | Jie Tang
Findings of the Association for Computational Linguistics: EMNLP 2020

We present BlockBERT, a lightweight and efficient BERT model for better modeling long-distance dependencies. Our model extends BERT by introducing sparse block structures into the attention matrix to reduce both memory consumption and training/inference time, which also enables attention heads to capture either short- or long-range contextual information. We conduct experiments on language model pre-training and several benchmark question answering datasets with various paragraph lengths. BlockBERT uses 18.7-36.1% less memory and 12.0-25.1% less time to learn the model. During testing, BlockBERT saves 27.8% inference time, while having comparable and sometimes better prediction accuracy, compared to an advanced BERT-based model, RoBERTa.

2019

pdf bib
BERT for Coreference Resolution: Baselines and Analysis
Mandar Joshi | Omer Levy | Luke Zettlemoyer | Daniel Weld
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We apply BERT to coreference resolution, achieving a new state of the art on the GAP (+11.5 F1) and OntoNotes (+3.9 F1) benchmarks. A qualitative analysis of model predictions indicates that, compared to ELMo and BERT-base, BERT-large is particularly better at distinguishing between related but distinct entities (e.g., President and CEO), but that there is still room for improvement in modeling document-level context, conversations, and mention paraphrasing. We will release all code and trained models upon publication.

pdf bib
Mask-Predict: Parallel Decoding of Conditional Masked Language Models
Marjan Ghazvininejad | Omer Levy | Yinhan Liu | Luke Zettlemoyer
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively, and then repeatedly mask out and regenerate the subset of words that the model is least confident about. By applying this strategy for a constant number of iterations, our model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average. It is also able to reach within about 1 BLEU point of a typical left-to-right transformer model, while decoding significantly faster.

pdf bib
Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation
Vladimir Karpukhin | Omer Levy | Jacob Eisenstein | Marjan Ghazvininejad
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Contemporary machine translation systems achieve greater coverage by applying subword models such as BPE and character-level CNNs, but these methods are highly sensitive to orthographical variations such as spelling mistakes. We show how training on a mild amount of random synthetic noise can dramatically improve robustness to these variations, without diminishing performance on clean text. We focus on translation performance on natural typos, and show that robustness to such noise can be achieved using a balanced diet of simple synthetic noises at training time, without access to the natural noise data or distribution.

pdf bib
What Does BERT Look at? An Analysis of BERT’s Attention
Kevin Clark | Urvashi Khandelwal | Omer Levy | Christopher D. Manning
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Large pre-trained neural networks such as BERT have had great recent success in NLP, motivating a growing body of research investigating what aspects of language they are able to learn from unlabeled data. Most recent analysis has focused on model outputs (e.g., language model surprisal) or internal vector representations (e.g., probing classifiers). Complementary to these works, we propose methods for analyzing the attention mechanisms of pre-trained models and apply them to BERT. BERT’s attention heads exhibit patterns such as attending to delimiter tokens, specific positional offsets, or broadly attending over the whole sentence, with heads in the same layer often exhibiting similar behaviors. We further show that certain attention heads correspond well to linguistic notions of syntax and coreference. For example, we find heads that attend to the direct objects of verbs, determiners of nouns, objects of prepositions, and coreferent mentions with remarkably high accuracy. Lastly, we propose an attention-based probing classifier and use it to further demonstrate that substantial syntactic information is captured in BERT’s attention.

pdf bib
pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
Mandar Joshi | Eunsol Choi | Omer Levy | Daniel Weld | Luke Zettlemoyer
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Reasoning about implied relationships (e.g. paraphrastic, common sense, encyclopedic) between pairs of words is crucial for many cross-sentence inference problems. This paper proposes new methods for learning and using embeddings of word pairs that implicitly represent background knowledge about such relationships. Our pairwise embeddings are computed as a compositional function of each word’s representation, which is learned by maximizing the pointwise mutual information (PMI) with the contexts in which the the two words co-occur. We add these representations to the cross-sentence attention layer of existing inference models (e.g. BiDAF for QA, ESIM for NLI), instead of extending or replacing existing word embeddings. Experiments show a gain of 2.7% on the recently released SQuAD 2.0 and 1.3% on MultiNLI. Our representations also aid in better generalization with gains of around 6-7% on adversarial SQuAD datasets, and 8.8% on the adversarial entailment test set by Glockner et al. (2018).

2018

pdf bib
Annotation Artifacts in Natural Language Inference Data
Suchin Gururangan | Swabha Swayamdipta | Omer Levy | Roy Schwartz | Samuel Bowman | Noah A. Smith
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to. We show that, in a significant portion of such data, this protocol leaves clues that make it possible to identify the label by looking only at the hypothesis, without observing the premise. Specifically, we show that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI (Bowman et. al, 2015) and 53% of MultiNLI (Williams et. al, 2017). Our analysis reveals that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Our findings suggest that the success of natural language inference models to date has been overestimated, and that the task remains a hard open problem.

pdf bib
Proceedings of the Workshop on Generalization in the Age of Deep Learning
Yonatan Bisk | Omer Levy | Mark Yatskar
Proceedings of the Workshop on Generalization in the Age of Deep Learning

pdf bib
LSTMs Exploit Linguistic Attributes of Data
Nelson F. Liu | Omer Levy | Roy Schwartz | Chenhao Tan | Noah A. Smith
Proceedings of The Third Workshop on Representation Learning for NLP

While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of sequential data. We investigate how the properties of natural language data affect an LSTM’s ability to learn a nonlinguistic task: recalling elements from its input. We find that models trained on natural language data are able to recall tokens from much longer sequences than models trained on non-language sequential data. Furthermore, we show that the LSTM learns to solve the memorization task by explicitly using a subset of its neurons to count timesteps in the input. We hypothesize that the patterns and structure in natural language data enable LSTMs to learn by providing approximate ways of reducing loss, but understanding the effect of different training data on the learnability of LSTMs remains an open question.

pdf bib
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Wang | Amanpreet Singh | Julian Michael | Felix Hill | Omer Levy | Samuel Bowman
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Human ability to understand language is general, flexible, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-of-domain data. If we aspire to develop models with understanding beyond the detection of superficial correspondences between inputs and outputs, then it is critical to develop a unified model that can execute a range of linguistic tasks across different domains. To facilitate research in this direction, we present the General Language Understanding Evaluation (GLUE, gluebenchmark.com): a benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models. For some benchmark tasks, training data is plentiful, but for others it is limited or does not match the genre of the test set. GLUE thus favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. While none of the datasets in GLUE were created from scratch for the benchmark, four of them feature privately-held test data, which is used to ensure that the benchmark is used fairly. We evaluate baselines that use ELMo (Peters et al., 2018), a powerful transfer learning technique, as well as state-of-the-art sentence representation models. The best models still achieve fairly low absolute scores. Analysis with our diagnostic dataset yields similarly weak performance over all phenomena tested, with some exceptions.

pdf bib
Ultra-Fine Entity Typing
Eunsol Choi | Omer Levy | Yejin Choi | Luke Zettlemoyer
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce a new entity typing task: given a sentence with an entity mention, the goal is to predict a set of free-form phrases (e.g. skyscraper, songwriter, or criminal) that describe appropriate types for the target entity. This formulation allows us to use a new type of distant supervision at large scale: head words, which indicate the type of the noun phrases they appear in. We show that these ultra-fine types can be crowd-sourced, and introduce new evaluation sets that are much more diverse and fine-grained than existing benchmarks. We present a model that can predict ultra-fine types, and is trained using a multitask objective that pools our new head-word supervision with prior supervision from entity linking. Experimental results demonstrate that our model is effective in predicting entity types at varying granularity; it achieves state of the art performance on an existing fine-grained entity typing benchmark, and sets baselines for our newly-introduced datasets.

pdf bib
Deep RNNs Encode Soft Hierarchical Syntax
Terra Blevins | Omer Levy | Luke Zettlemoyer
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present a set of experiments to demonstrate that deep recurrent neural networks (RNNs) learn internal representations that capture soft hierarchical notions of syntax from highly varied supervision. We consider four syntax tasks at different depths of the parse tree; for each word, we predict its part of speech as well as the first (parent), second (grandparent) and third level (great-grandparent) constituent labels that appear above it. These predictions are made from representations produced at different depths in networks that are pretrained with one of four objectives: dependency parsing, semantic role labeling, machine translation, or language modeling. In every case, we find a correspondence between network depth and syntactic depth, suggesting that a soft syntactic hierarchy emerges. This effect is robust across all conditions, indicating that the models encode significant amounts of syntax even in the absence of an explicit syntactic training supervision.

pdf bib
Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling
Luheng He | Kenton Lee | Omer Levy | Luke Zettlemoyer
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Recent BIO-tagging-based neural semantic role labeling models are very high performing, but assume gold predicates as part of the input and cannot incorporate span-level features. We propose an end-to-end approach for jointly predicting all predicates, arguments spans, and the relations between them. The model makes independent decisions about what relationship, if any, holds between every possible word-span pair, and learns contextualized span representations that provide rich, shared input features for each decision. Experiments demonstrate that this approach sets a new state of the art on PropBank SRL without gold predicates.

pdf bib
Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum
Omer Levy | Kenton Lee | Nicholas FitzGerald | Luke Zettlemoyer
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

LSTMs were introduced to combat vanishing gradients in simple RNNs by augmenting them with gated additive recurrent connections. We present an alternative view to explain the success of LSTMs: the gates themselves are versatile recurrent models that provide more representational power than previously appreciated. We do this by decoupling the LSTM’s gates from the embedded simple RNN, producing a new class of RNNs where the recurrence computes an element-wise weighted sum of context-independent functions of the input. Ablations on a range of problems demonstrate that the gating mechanism alone performs as well as an LSTM in most settings, strongly suggesting that the gates are doing much more in practice than just alleviating vanishing gradients.

2017

pdf bib
A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments
Omer Levy | Anders Søgaard | Yoav Goldberg
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

While cross-lingual word embeddings have been studied extensively in recent years, the qualitative differences between the different algorithms remain vague. We observe that whether or not an algorithm uses a particular feature set (sentence IDs) accounts for a significant performance gap among these algorithms. This feature set is also used by traditional alignment algorithms, such as IBM Model-1, which demonstrate similar performance to state-of-the-art embedding algorithms on a variety of benchmarks. Overall, we observe that different algorithmic approaches for utilizing the sentence ID feature space result in similar performance. This paper draws both empirical and theoretical parallels between the embedding and alignment literature, and suggests that adding additional sources of information, which go beyond the traditional signal of bilingual sentence-aligned corpora, may substantially improve cross-lingual word embeddings, and that future baselines should at least take such features into account.

pdf bib
Named Entity Disambiguation for Noisy Text
Yotam Eshel | Noam Cohen | Kira Radinsky | Shaul Markovitch | Ikuya Yamada | Omer Levy
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

We address the task of Named Entity Disambiguation (NED) for noisy text. We present WikilinksNED, a large-scale NED dataset of text fragments from the web, which is significantly noisier and more challenging than existing news-based datasets. To capture the limited and noisy local context surrounding each mention, we design a neural model and train it with a novel method for sampling informative negative examples. We also describe a new way of initializing word and entity embeddings that significantly improves performance. Our model significantly outperforms existing state-of-the-art methods on WikilinksNED while achieving comparable performance on a smaller newswire dataset.

pdf bib
Zero-Shot Relation Extraction via Reading Comprehension
Omer Levy | Minjoon Seo | Eunsol Choi | Luke Zettlemoyer
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

We show that relation extraction can be reduced to answering simple reading comprehension questions, by associating one or more natural-language questions with each relation slot. This reduction has several advantages: we can (1) learn relation-extraction models by extending recent neural reading-comprehension techniques, (2) build very large training sets for those models by combining relation-specific crowd-sourced questions with distant supervision, and even (3) do zero-shot learning by extracting new relation types that are only specified at test-time, for which we have no labeled training examples. Experiments on a Wikipedia slot-filling task demonstrate that the approach can generalize to new questions for known relation types with high accuracy, and that zero-shot generalization to unseen relation types is possible, at lower accuracy levels, setting the bar for future work on this task.

pdf bib
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP
Samuel Bowman | Yoav Goldberg | Felix Hill | Angeliki Lazaridou | Omer Levy | Roi Reichart | Anders Søgaard
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP

2016

pdf bib
Modeling Extractive Sentence Intersection via Subtree Entailment
Omer Levy | Ido Dagan | Gabriel Stanovsky | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Sentence intersection captures the semantic overlap of two texts, generalizing over paradigms such as textual entailment and semantic text similarity. Despite its modeling power, it has received little attention because it is difficult for non-experts to annotate. We analyze 200 pairs of similar sentences and identify several underlying properties of sentence intersection. We leverage these insights to design an algorithm that decomposes the sentence intersection task into several simpler annotation tasks, facilitating the construction of a high quality dataset via crowdsourcing. We implement this approach and provide an annotated dataset of 1,764 sentence intersections.

pdf bib
Annotating Relation Inference in Context via Question Answering
Omer Levy | Ido Dagan
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Learning to Exploit Structured Resources for Lexical Inference
Vered Shwartz | Omer Levy | Ido Dagan | Jacob Goldberger
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Improving Distributional Similarity with Lessons Learned from Word Embeddings
Omer Levy | Yoav Goldberg | Ido Dagan
Transactions of the Association for Computational Linguistics, Volume 3

Recent trends suggest that neural-network-inspired word embedding models outperform traditional count-based distributional models on word similarity and analogy detection tasks. We reveal that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves. Furthermore, we show that these modifications can be transferred to traditional distributional models, yielding similar gains. In contrast to prior reports, we observe mostly local or insignificant performance differences between the methods, with no global advantage to any single approach over the others.

pdf bib
Do Supervised Distributional Methods Really Learn Lexical Inference Relations?
Omer Levy | Steffen Remus | Chris Biemann | Ido Dagan
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Simple Word Embedding Model for Lexical Substitution
Oren Melamud | Omer Levy | Ido Dagan
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

2014

pdf bib
Dependency-Based Word Embeddings
Omer Levy | Yoav Goldberg
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
The Excitement Open Platform for Textual Inferences
Bernardo Magnini | Roberto Zanoli | Ido Dagan | Kathrin Eichler | Guenter Neumann | Tae-Gil Noh | Sebastian Pado | Asher Stern | Omer Levy
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
Focused Entailment Graphs for Open IE Propositions
Omer Levy | Ido Dagan | Jacob Goldberger
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

pdf bib
Linguistic Regularities in Sparse and Explicit Word Representations
Omer Levy | Yoav Goldberg
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

pdf bib
Proposition Knowledge Graphs
Gabriel Stanovsky | Omer Levy | Ido Dagan
Proceedings of the First AHA!-Workshop on Information Discovery in Text

2013

pdf bib
UKP-BIU: Similarity and Entailment Metrics for Student Response Analysis
Omer Levy | Torsten Zesch | Ido Dagan | Iryna Gurevych
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
Recognizing Partial Textual Entailment
Omer Levy | Torsten Zesch | Ido Dagan | Iryna Gurevych
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)