ACL-rlg: A Dataset for Reading List Generation

Julien Aubert-Béduchaud, Florian Boudin, Béatrice Daille, Richard Dufour


Abstract
Familiarizing oneself with a new scientific field and its existing literature can be daunting due to the large amount of available articles. Curated lists of academic references, or reading lists, compiled by experts, offer a structured way to gain a comprehensive overview of a domain or a specific scientific challenge. In this work, we introduce ACL-rlg, the largest open expert-annotated reading list dataset. We also provide multiple baselines for evaluating reading list generation and formally define it as a retrieval task. Our qualitative study highlights that traditional scholarly search engines and indexing methods perform poorly on this task, and GPT-4o, despite showing better results, exhibits signs of potential data contamination.
Anthology ID:
2025.coling-main.327
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4910–4919
Language:
URL:
https://aclanthology.org/2025.coling-main.327/
DOI:
Bibkey:
Cite (ACL):
Julien Aubert-Béduchaud, Florian Boudin, Béatrice Daille, and Richard Dufour. 2025. ACL-rlg: A Dataset for Reading List Generation. In Proceedings of the 31st International Conference on Computational Linguistics, pages 4910–4919, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
ACL-rlg: A Dataset for Reading List Generation (Aubert-Béduchaud et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.327.pdf