Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning

Sai Munikoti, Anurag Acharya, Sridevi Wagle, Sameera Horawalavithana


Abstract
Despite the dramatic progress in Large Language Model (LLM) development, LLMs often provide seemingly plausible but not factual information, often referred to as hallucinations. Retrieval-augmented LLMs provide a non-parametric approach to solve these issues by retrieving relevant information from external data sources and augment the training process. These models help to trace evidence from an externally provided knowledge base allowing the model predictions to be better interpreted and verified. In this work, we critically evaluate these models in their ability to perform in scientific document reasoning tasks. To this end, we tuned multiple such model variants with science-focused instructions and evaluated them on a scientific document reasoning benchmark for the usefulness of the retrieved document passages. Our findings suggest that models justify predictions in science tasks with fabricated evidence and leveraging scientific corpus as pretraining data does not alleviate the risk of evidence fabrication.
Anthology ID:
2024.sdp-1.8
Volume:
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Tirthankar Ghosal, Amanpreet Singh, Anita Waard, Philipp Mayr, Aakanksha Naik, Orion Weller, Yoonjoo Lee, Shannon Shen, Yanxia Qin
Venues:
sdp | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
84–89
Language:
URL:
https://aclanthology.org/2024.sdp-1.8
DOI:
Bibkey:
Cite (ACL):
Sai Munikoti, Anurag Acharya, Sridevi Wagle, and Sameera Horawalavithana. 2024. Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning. In Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024), pages 84–89, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning (Munikoti et al., sdp-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sdp-1.8.pdf