Idan Szpektor


2022

pdf bib
TRUE: Re-evaluating Factual Consistency Evaluation
Or Honovich | Roee Aharoni | Jonathan Herzig | Hagai Taitelbaum | Doron Kukliansy | Vered Cohen | Thomas Scialom | Idan Szpektor | Avinatan Hassidim | Yossi Matias
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

Grounded text generation systems often generate text that contains factual inconsistencies, hindering their real-world applicability. Automatic factual consistency evaluation may help alleviate this limitation by accelerating evaluation cycles, filtering inconsistent outputs and augmenting training data. While attracting increasing attention, such evaluation metrics are usually developed and evaluated in silo for a single task or dataset, slowing their adoption. Moreover, previous meta-evaluation protocols focused on system-level correlations with human annotations, which leave the example-level accuracy of such metrics unclear.In this work, we introduce TRUE: a comprehensive study of factual consistency metrics on a standardized collection of existing texts from diverse tasks, manually annotated for factual consistency. Our standardization enables an example-level meta-evaluation protocol that is more actionable and interpretable than previously reported correlations, yielding clearer quality measures. Across diverse state-of-the-art metrics and 11 datasets we find that large-scale NLI and question generation-and-answering-based approaches achieve strong and complementary results. We recommend those methods as a starting point for model and metric developers, and hope TRUE will foster progress towards even better methods.

2021

pdf bib
Q2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering
Or Honovich | Leshem Choshen | Roee Aharoni | Ella Neeman | Idan Szpektor | Omri Abend
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Neural knowledge-grounded generative models for dialogue often produce content that is factually inconsistent with the knowledge they rely on, making them unreliable and limiting their applicability. Inspired by recent work on evaluating factual consistency in abstractive summarization, we propose an automatic evaluation metric for factual consistency in knowledge-grounded dialogue using automatic question generation and question answering. Our metric, denoted Q2, compares answer spans using natural language inference (NLI), instead of token-based matching as done in previous work. To foster proper evaluation, we curate a novel dataset of dialogue system outputs for the Wizard-of-Wikipedia dataset, manually annotated for factual consistency. We perform a thorough meta-evaluation of Q2 against other metrics using this dataset and two others, where it consistently shows higher correlation with human judgements.

2020

pdf bib
Semantically Driven Sentence Fusion: Modeling and Evaluation
Eyal Ben-David | Orgad Keller | Eric Malmi | Idan Szpektor | Roi Reichart
Findings of the Association for Computational Linguistics: EMNLP 2020

Sentence fusion is the task of joining related sentences into coherent text. Current training and evaluation schemes for this task are based on single reference ground-truths and do not account for valid fusion variants. We show that this hinders models from robustly capturing the semantic relationship between input sentences. To alleviate this, we present an approach in which ground-truth solutions are automatically expanded into multiple references via curated equivalence classes of connective phrases. We apply this method to a large-scale dataset and use the augmented dataset for both model training and evaluation. To improve the learning of semantic representation using multiple references, we enrich the model with auxiliary discourse classification tasks under a multi-tasking framework. Our experiments highlight the improvements of our approach over state-of-the-art models.

2019

pdf bib
DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion
Mor Geva | Eric Malmi | Idan Szpektor | Jonathan Berant
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Sentence fusion is the task of joining several independent sentences into a single coherent text. Current datasets for sentence fusion are small and insufficient for training modern neural models. In this paper, we propose a method for automatically-generating fusion examples from raw text and present DiscoFuse, a large scale dataset for discourse-based sentence fusion. We author a set of rules for identifying a diverse set of discourse phenomena in raw text, and decomposing the text into two independent sentences. We apply our approach on two document collections: Wikipedia and Sports articles, yielding 60 million fusion examples annotated with discourse information required to reconstruct the fused text. We develop a sequence-to-sequence model on DiscoFuse and thoroughly analyze its strengths and weaknesses with respect to the various discourse phenomena, using both automatic as well as human evaluation. Finally, we conduct transfer learning experiments with WebSplit, a recent dataset for text simplification. We show that pretraining on DiscoFuse substantially improves performance on WebSplit when viewed as a sentence fusion task.

pdf bib
Audio De-identification - a New Entity Recognition Task
Ido Cohn | Itay Laish | Genady Beryozkin | Gang Li | Izhak Shafran | Idan Szpektor | Tzvika Hartman | Avinatan Hassidim | Yossi Matias
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. The application of NER in the context of audio de-identification has yet to be fully investigated. To this end, we define the task of audio de-ID, in which audio spans with entity mentions should be detected. We then present our pipeline for this task, which involves Automatic Speech Recognition (ASR), NER on the transcript text, and text-to-audio alignment. Finally, we introduce a novel metric for audio de-ID and a new evaluation benchmark consisting of a large labeled segment of the Switchboard and Fisher audio datasets and detail our pipeline’s results on it.

pdf bib
A Joint Named-Entity Recognizer for Heterogeneous Tag-sets Using a Tag Hierarchy
Genady Beryozkin | Yoel Drori | Oren Gilon | Tzvika Hartman | Idan Szpektor
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We study a variant of domain adaptation for named-entity recognition where multiple, heterogeneously tagged training sets are available. Furthermore, the test tag-set is not identical to any individual training tag-set. Yet, the relations between all tags are provided in a tag hierarchy, covering the test tags as a combination of training tags. This setting occurs when various datasets are created using different annotation schemes. This is also the case of extending a tag-set with a new tag by annotating only the new tag in a new dataset. We propose to use the given tag hierarchy to jointly learn a neural network that shares its tagging layer among all tag-sets. We compare this model to combining independent models and to a model based on the multitasking approach. Our experiments show the benefit of the tag-hierarchy model, especially when facing non-trivial consolidation of tag-sets.

2016

pdf bib
Syntactic Parsing of Web Queries with Question Intent
Yuval Pinter | Roi Reichart | Idan Szpektor
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Probabilistic Modeling of Joint-context in Distributional Similarity
Oren Melamud | Ido Dagan | Jacob Goldberger | Idan Szpektor | Deniz Yuret
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

2013

pdf bib
Generating Synthetic Comparable Questions for News Articles
Oleg Rokhlenko | Idan Szpektor
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Two Level Model for Context Sensitive Inference Rules
Oren Melamud | Jonathan Berant | Ido Dagan | Jacob Goldberger | Idan Szpektor
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Using Lexical Expansion to Learn Inference Rules from Sparse Data
Oren Melamud | Ido Dagan | Jacob Goldberger | Idan Szpektor
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Learning Verb Inference Rules from Linguistically-Motivated Evidence
Hila Weisman | Jonathan Berant | Idan Szpektor | Ido Dagan
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Classification-based Contextual Preferences
Shachar Mirkin | Ido Dagan | Lili Kotlerman | Idan Szpektor
Proceedings of the TextInfer 2011 Workshop on Textual Entailment

2010

pdf bib
Textual Entailment
Mark Sammons | Idan Szpektor | V.G.Vinod Vydiswaran
NAACL HLT 2010 Tutorial Abstracts

pdf bib
Generating Entailment Rules from FrameNet
Roni Ben Aharon | Idan Szpektor | Ido Dagan
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf bib
Source-Language Entailment Modeling for Translating Unknown Terms
Shachar Mirkin | Lucia Specia | Nicola Cancedda | Ido Dagan | Marc Dymetman | Idan Szpektor
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Directional Distributional Similarity for Lexical Expansion
Lili Kotlerman | Ido Dagan | Idan Szpektor | Maayan Zhitomirsky-Geffet
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Augmenting WordNet-based Inference with Argument Mapping
Idan Szpektor | Ido Dagan
Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer)

2008

pdf bib
Contextual Preferences
Idan Szpektor | Ido Dagan | Roy Bar-Haim | Jacob Goldberger
Proceedings of ACL-08: HLT

pdf bib
Learning Entailment Rules for Unary Templates
Idan Szpektor | Ido Dagan
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Instance-based Evaluation of Entailment Rule Acquisition
Idan Szpektor | Eyal Shnarch | Ido Dagan
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Cross Lingual and Semantic Retrieval for Cultural Heritage Appreciation
Idan Szpektor | Ido Dagan | Alon Lavie | Danny Shacham | Shuly Wintner
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

pdf bib
Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition
Roy Bar-Haim | Ido Dagan | Iddo Greental | Idan Szpektor | Moshe Friedman
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

2006

pdf bib
Investigating a Generic Paraphrase-Based Approach for Relation Extraction
Lorenza Romano | Milen Kouylekov | Idan Szpektor | Ido Dagan | Alberto Lavelli
11th Conference of the European Chapter of the Association for Computational Linguistics

2005

pdf bib
Definition and Analysis of Intermediate Entailment Levels
Roy Bar-Haim | Idan Szpektor | Oren Glickman
Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment

2004

pdf bib
Scaling Web-based Acquisition of Entailment Relations
Idan Szpektor | Hristo Tanev | Ido Dagan | Bonaventura Coppola
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing