Sina Taslimi


2025

pdf bib
Extracting, Detecting, and Generating Research Questions for Scientific Articles
Sina Taslimi | Artemis Capari | Hosein Azarbonyad | Zi Long Zhu | Zubair Afzal | Evangelos Kanoulas | George Tsatsaronis
Proceedings of the 31st International Conference on Computational Linguistics

The volume of academic articles is increasing rapidly, reflecting the growing emphasis on research and scholarship across different science disciplines. This rapid growth necessitates the development of tools for more efficient and rapid understanding of these articles. Clear and well-defined Research Questions (RQs) in research articles can help guide scholarly inquiries. However, many academic studies lack a proper definition of RQs in their articles. This research addresses this gap by presenting a comprehensive framework for the systematic extraction, detection, and generation of RQs from scientific articles. The extraction component uses a set of regular expressions to identify articles containing well-defined RQs. The detection component aims to identify more complex RQs in articles, beyond those captured by the rule-based extraction method. The RQ generation focuses on creating RQs for articles that lack them. We integrate all these components to build a pipeline to extract RQs or generate them based on the articles’ full text. We evaluate the performance of the designed pipeline on a set of metrics designed to assess the quality of RQs. Our results indicate that the proposed pipeline can reliably detect RQs and generate high-quality ones.