ROUGE-SciQFS: A ROUGE-based Method to Automatically Create Datasets for Scientific Query-Focused Summarization

Juan Ramirez-Orta, Ana Maguitman, Axel J. Soto, Evangelos Milios


Abstract
So far, the task of Scientific Query-Focused Summarization (Sci-QFS) has lagged in development when compared to other areas of Scientific Natural Language Processing because of the lack of data. In this work, we propose a methodology to take advantage of existing collections of academic papers to obtain large-scale datasets for this task automatically. After applying it to the papers from our reading group, we introduce a novel dataset for Sci-QFS composed of 8,695 examples, each one with a query, the sentences of the full text from a paper and the relevance labels for each. After testing several classical and state-of-the-art embedding models on this data, we found that the task of Sci-QFS is far from being solved, although it is relatively straightforward for humans. Surprisingly, we found that classical methods outperformed modern pre-trained Deep Language Models (sometimes by a large margin), showing the need for large datasets to better fine-tune the latter. We share our experiments, data and models at https://github.com/jarobyte91/rouge_sciqfs.
Anthology ID:
2025.coling-main.149
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2187–2197
Language:
URL:
https://aclanthology.org/2025.coling-main.149/
DOI:
Bibkey:
Cite (ACL):
Juan Ramirez-Orta, Ana Maguitman, Axel J. Soto, and Evangelos Milios. 2025. ROUGE-SciQFS: A ROUGE-based Method to Automatically Create Datasets for Scientific Query-Focused Summarization. In Proceedings of the 31st International Conference on Computational Linguistics, pages 2187–2197, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
ROUGE-SciQFS: A ROUGE-based Method to Automatically Create Datasets for Scientific Query-Focused Summarization (Ramirez-Orta et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.149.pdf