NUTSHELL: A Dataset for Abstract Generation from Scientific Talks

Maike Züfle; Sara Papi; Beatrice Savoldi; Marco Gaido; Luisa Bentivogli; Jan Niehues

doi:10.18653/v1/2025.iwslt-1.2

NUTSHELL: A Dataset for Abstract Generation from Scientific Talks

Maike Züfle, Sara Papi, Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Jan Niehues

Abstract

Scientific communication is receiving increasing attention in natural language processing, especially to help researches access, summarize, and generate content. One emerging application in this area is Speech-to-Abstract Generation (SAG), which aims to automatically generate abstracts from recorded scientific presentations. SAG enables researchers to efficiently engage with conference talks, but progress has been limited by a lack of large-scale datasets. To address this gap, we introduce NUTSHELL, a novel multimodal dataset of *ACL conference talks paired with their corresponding abstracts. We establish strong baselines for SAG and evaluate the quality of generated abstracts using both automatic metrics and human judgments. Our results highlight the challenges of SAG and demonstrate the benefits of training on NUTSHELL. By releasing NUTSHELL under an open license (CC-BY 4.0), we aim to advance research in SAG and foster the development of improved models and evaluation methods.

Anthology ID:: 2025.iwslt-1.2
Volume:: Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria (in-person and online)
Editors:: Elizabeth Salesky, Marcello Federico, Antonis Anastasopoulos
Venues:: IWSLT | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19–32
Language:
URL:: https://aclanthology.org/2025.iwslt-1.2/
DOI:: 10.18653/v1/2025.iwslt-1.2
Bibkey:
Cite (ACL):: Maike Züfle, Sara Papi, Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, and Jan Niehues. 2025. NUTSHELL: A Dataset for Abstract Generation from Scientific Talks. In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025), pages 19–32, Vienna, Austria (in-person and online). Association for Computational Linguistics.
Cite (Informal):: NUTSHELL: A Dataset for Abstract Generation from Scientific Talks (Züfle et al., IWSLT 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.iwslt-1.2.pdf

PDF Cite Search Fix data