Buse Sibel Korkmaz


2024

pdf bib
Integrating Table Representations into Large Language Models for Improved Scholarly Document Comprehension
Buse Sibel Korkmaz | Antonio Del Rio Chanona
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)

We address the challenge of interpreting and reasoning over scientific tables with Large Language Models (LLMs), a crucial aspect of scholarly documents. Despite significant progress in natural language processing, the integration of tabular data into scientific LLMs remains limited. We propose an innovative approach leveraging intermediate task pre-training on table question-answering datasets, followed by model adaptation to comprehend tables in computer science literature. Our findings reveal that incorporating table understanding substantially improves the performance of LLMs on scientific literature understanding tasks, which we showcase in peer-review score prediction. This improvement underscores the importance of utilizing tabular data in the training of scientific language models. The code and models are publicly available at [this link](https://github.com/buseskorkmaz/Integrating-Table-Representations-into-LLMs).