Evaluating and Training Long-Context Large Language Models for Question Answering on Scientific Papers

Lukas Hilgert, Danni Liu, Jan Niehues


Abstract
With the number of scientific papers published every year growing and current large language models (LLMs) showing state-of-the-art performance on natural language processing (NLP) tasks, we ask the question if LLMs could be utilized to answer questions on scientific papers.We investigate how well state-of-the-art large language models (LLMs) can answer questions on scientific paper by experimenting with long-context versions of the LLaMA 2 model and evaluating and training on the Qasper dataset.We analyze how well the LLMs handle longer papers and questions that can only be answered by accessing information from far out paragraphs. During our experiments, we see that the performance of these LLMs drops with growing length and position of relevant information.We employ different measures from simple prompts to chain-of-thought prompts and zero-shot usage to fine-tuning with QLoRA.While we still observe a performance loss with increased context length, our measures reduce the effects of this flaw, and we can achieve F1 scores similar to bigger models like GPT-4.
Anthology ID:
2024.customnlp4u-1.17
Volume:
Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Sachin Kumar, Vidhisha Balachandran, Chan Young Park, Weijia Shi, Shirley Anugrah Hayati, Yulia Tsvetkov, Noah Smith, Hannaneh Hajishirzi, Dongyeop Kang, David Jurgens
Venue:
CustomNLP4U
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
220–236
Language:
URL:
https://aclanthology.org/2024.customnlp4u-1.17
DOI:
Bibkey:
Cite (ACL):
Lukas Hilgert, Danni Liu, and Jan Niehues. 2024. Evaluating and Training Long-Context Large Language Models for Question Answering on Scientific Papers. In Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U), pages 220–236, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Evaluating and Training Long-Context Large Language Models for Question Answering on Scientific Papers (Hilgert et al., CustomNLP4U 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.customnlp4u-1.17.pdf