Ori Ernst


2022

pdf bib
Proposition-Level Clustering for Multi-Document Summarization
Ori Ernst | Avi Caciularu | Ori Shapira | Ramakanth Pasunuru | Mohit Bansal | Jacob Goldberger | Ido Dagan
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition. Particularly, clusters were leveraged to indicate information saliency as well as to avoid redundancy. Such prior methods focused on clustering sentences, even though closely related sentences usually contain also non-aligned parts. In this work, we revisit the clustering approach, grouping together sub-sentential propositions, aiming at more precise information alignment. Specifically, our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster via text fusion.Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets, both in automatic ROUGE scores and human preference.

pdf bib
Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations
Daniela Brook Weiss | Paul Roit | Ori Ernst | Ido Dagan
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

NLP models that process multiple texts often struggle in recognizing corresponding and salient information that is often differently phrased, and consolidating the redundancies across texts. To facilitate research of such challenges, the sentence fusion task was proposed, yet previous datasets for this task were very limited in their size and scope. In this paper, we revisit and substantially extend previous dataset creation efforts. With careful modifications, relabeling, and employing complementing data sources, we were able to more than triple the size of a notable earlier dataset.Moreover, we show that our extended version uses more representative texts for multi-document tasks and provides a more diverse training set, which substantially improves model performance.

2021

pdf bib
QA-Align: Representing Cross-Text Content Overlap by Aligning Question-Answer Propositions
Daniela Brook Weiss | Paul Roit | Ayal Klein | Ori Ernst | Ido Dagan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Multi-text applications, such as multi-document summarization, are typically required to model redundancies across related texts. Current methods confronting consolidation struggle to fuse overlapping information. In order to explicitly represent content overlap, we propose to align predicate-argument relations across texts, providing a potential scaffold for information consolidation. We go beyond clustering coreferring mentions, and instead model overlap with respect to redundancy at a propositional level, rather than merely detecting shared referents. Our setting exploits QA-SRL, utilizing question-answer pairs to capture predicate-argument relations, facilitating laymen annotation of cross-text alignments. We employ crowd-workers for constructing a dataset of QA-based alignments, and present a baseline QA alignment model trained over our dataset. Analyses show that our new task is semantically challenging, capturing content overlap beyond lexical similarity and complements cross-document coreference with proposition-level links, offering potential use for downstream tasks.

pdf bib
iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration
Eran Hirsch | Alon Eirew | Ori Shapira | Avi Caciularu | Arie Cattan | Ori Ernst | Ramakanth Pasunuru | Hadar Ronen | Mohit Bansal | Ido Dagan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We introduce iFᴀᴄᴇᴛSᴜᴍ, a web application for exploring topical document collections. iFᴀᴄᴇᴛSᴜᴍ integrates interactive summarization together with faceted search, by providing a novel faceted navigation scheme that yields abstractive summaries for the user’s selections. This approach offers both a comprehensive overview as well as particular details regard-ing subtopics of choice. The facets are automatically produced based on cross-document coreference pipelines, rendering generic concepts, entities and statements surfacing in the source texts. We analyze the effectiveness of our application through small-scale user studies that suggest the usefulness of our tool.

pdf bib
Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline
Ori Ernst | Ori Shapira | Ramakanth Pasunuru | Michael Lepioshkin | Jacob Goldberger | Mohit Bansal | Ido Dagan
Proceedings of the 25th Conference on Computational Natural Language Learning

Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection. Despite its assessed utility, the alignment step was mostly approached with heuristic unsupervised methods, typically ROUGE-based, and was never independently optimized or evaluated. In this paper, we propose establishing summary-source alignment as an explicit task, while introducing two major novelties: (1) applying it at the more accurate proposition span level, and (2) approaching it as a supervised classification task. To that end, we created a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data. In addition, we crowdsourced dev and test datasets, enabling model development and proper evaluation. Utilizing these data, we present a supervised proposition alignment baseline model, showing improved alignment-quality over the unsupervised approach.