ROUGE-SciQFS: A ROUGE-based Method to Automatically Create Datasets for Scientific Query-Focused Summarization

Juan Ramirez-Orta; Ana Maguitman; Axel J. Soto; Evangelos Milios

ROUGE-SciQFS: A ROUGE-based Method to Automatically Create Datasets for Scientific Query-Focused Summarization

Juan Ramirez-Orta, Ana Maguitman, Axel J. Soto, Evangelos Milios

Abstract

So far, the task of Scientific Query-Focused Summarization (Sci-QFS) has lagged in development when compared to other areas of Scientific Natural Language Processing because of the lack of data. In this work, we propose a methodology to take advantage of existing collections of academic papers to obtain large-scale datasets for this task automatically. After applying it to the papers from our reading group, we introduce a novel dataset for Sci-QFS composed of 8,695 examples, each one with a query, the sentences of the full text from a paper and the relevance labels for each. After testing several classical and state-of-the-art embedding models on this data, we found that the task of Sci-QFS is far from being solved, although it is relatively straightforward for humans. Surprisingly, we found that classical methods outperformed modern pre-trained Deep Language Models (sometimes by a large margin), showing the need for large datasets to better fine-tune the latter. We share our experiments, data and models at https://github.com/jarobyte91/rouge_sciqfs.

Anthology ID:: 2025.coling-main.149
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2187–2197
Language:
URL:: https://aclanthology.org/2025.coling-main.149/
DOI:
Bibkey:
Cite (ACL):: Juan Ramirez-Orta, Ana Maguitman, Axel J. Soto, and Evangelos Milios. 2025. ROUGE-SciQFS: A ROUGE-based Method to Automatically Create Datasets for Scientific Query-Focused Summarization. In Proceedings of the 31st International Conference on Computational Linguistics, pages 2187–2197, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: ROUGE-SciQFS: A ROUGE-based Method to Automatically Create Datasets for Scientific Query-Focused Summarization (Ramirez-Orta et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.149.pdf

PDF Cite Search Fix data