2024
pdf
bib
abs
Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning
Shramay Palta
|
Nishant Balepur
|
Peter Rankel
|
Sarah Wiegreffe
|
Marine Carpuat
|
Rachel Rudinger
Findings of the Association for Computational Linguistics: EMNLP 2024
Questions involving commonsense reasoning about everyday situations often admit many possible or plausible answers. In contrast, multiple-choice question (MCQ) benchmarks for commonsense reasoning require a hard selection of a single correct answer, which, in principle, should represent the most plausible answer choice. On 250 MCQ items sampled from two commonsense reasoning benchmarks, we collect 5,000 independent plausibility judgments on answer choices. We find that for over 20% of the sampled MCQS, the answer choice rated most plausible does not match the benchmark gold answers; upon manual inspection, we confirm that this subset exhibits higher rates of problems like ambiguity or semantic mismatch between question and answer choices. Experiments with LLMs reveal low accuracyand high variation in performance on the subset, suggesting our plausibility criterion may be helpful in identifying more reliable benchmark items for commonsense evaluation.
2019
pdf
bib
abs
RANLP 2019 Multilingual Headline Generation Task Overview
Marina Litvak
|
John M. Conroy
|
Peter A. Rankel
Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources
The objective of the 2019 RANLP Multilingual Headline Generation (HG) Task is to explore some of the challenges highlighted by current state of the art approaches on creating informative headlines to news articles: non-descriptive headlines, out-of-domain training data, generating headlines from long documents which are not well represented by the head heuristic, and dealing with multilingual domain. This tasks makes available a large set of training data for headline generation and provides an evaluation methods for the task. Our data sets are drawn from Wikinews as well as Wikipedia. Participants were required to generate headlines for at least 3 languages, which were evaluated via automatic methods. A key aspect of the task is multilinguality. The task measures the performance of multilingual headline generation systems using the Wikipedia and Wikinews articles in multiple languages. The objective is to assess the performance of automatic headline generation techniques on text documents covering a diverse range of languages and topics outside the news domain.
2017
pdf
bib
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres
George Giannakopoulos
|
Elena Lloret
|
John M. Conroy
|
Josef Steinberger
|
Marina Litvak
|
Peter Rankel
|
Benoit Favre
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres
pdf
bib
abs
MultiLing 2017 Overview
George Giannakopoulos
|
John Conroy
|
Jeff Kubina
|
Peter A. Rankel
|
Elena Lloret
|
Josef Steinberger
|
Marina Litvak
|
Benoit Favre
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres
In this brief report we present an overview of the MultiLing 2017 effort and workshop, as implemented within EACL 2017. MultiLing is a community-driven initiative that pushes the state-of-the-art in Automatic Summarization by providing data sets and fostering further research and development of summarization systems. This year the scope of the workshop was widened, bringing together researchers that work on summarization across sources, languages and genres. We summarize the main tasks planned and implemented this year, the contributions received, and we also provide insights on next steps.
2013
pdf
bib
A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art
Peter A. Rankel
|
John M. Conroy
|
Hoa Trang Dang
|
Ani Nenkova
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
2012
pdf
bib
Assessing the Effect of Inconsistent Assessors on Summarization Evaluation
Karolina Owczarzak
|
Peter A. Rankel
|
Hoa Trang Dang
|
John M. Conroy
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
2011
pdf
bib
Ranking Human and Machine Summarization Systems
Peter Rankel
|
John Conroy
|
Eric Slud
|
Dianne O’Leary
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing