David Gabay


2019

pdf bib
Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation
Ori Shapira | David Gabay | Yang Gao | Hadar Ronen | Ramakanth Pasunuru | Mohit Bansal | Yael Amsterdamer | Ido Dagan
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.

2018

pdf bib
Evaluating Multiple System Summary Lengths: A Case Study
Ori Shapira | David Gabay | Hadar Ronen | Judit Bar-Ilan | Yael Amsterdamer | Ani Nenkova | Ido Dagan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Practical summarization systems are expected to produce summaries of varying lengths, per user needs. While a couple of early summarization benchmarks tested systems across multiple summary lengths, this practice was mostly abandoned due to the assumed cost of producing reference summaries of multiple lengths. In this paper, we raise the research question of whether reference summaries of a single length can be used to reliably evaluate system summaries of multiple lengths. For that, we have analyzed a couple of datasets as a case study, using several variants of the ROUGE metric that are standard in summarization evaluation. Our findings indicate that the evaluation protocol in question is indeed competitive. This result paves the way to practically evaluating varying-length summaries with simple, possibly existing, summarization benchmarks.

2009

pdf bib
Gaiku : Generating Haiku with Word Associations Norms
Yael Netzer | David Gabay | Yoav Goldberg | Michael Elhadad
Proceedings of the Workshop on Computational Approaches to Linguistic Creativity

2008

pdf bib
Unsupervised Lexicon-Based Resolution of Unknown Words for Full Morphological Analysis
Meni Adler | Yoav Goldberg | David Gabay | Michael Elhadad
Proceedings of ACL-08: HLT

pdf bib
Tagging a Hebrew Corpus: the Case of Participles
Meni Adler | Yael Netzer | Yoav Goldberg | David Gabay | Michael Elhadad
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We report on an effort to build a corpus of Modern Hebrew tagged with part-of-speech and morphology. We designed a tagset specific to Hebrew while focusing on four aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags assigned to maintain consistency; the tagset should be useful for machine taggers and learning algorithms; and the tagset should be effective for applications relying on the tags as input features. In this paper, we illustrate these issues by explaining our decision to introduce a tag for beinoni forms in Hebrew. We explain how this tag is defined, and how it helped us improve manual tagging accuracy to a high-level, while improving automatic tagging and helping in the task of syntactic chunking.

2007

pdf bib
Can You Tag the Modal? You Should.
Yael Netzer | Meni Adler | David Gabay | Michael Elhadad
Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources