NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization

Md Nahid, Davood Rafiei


Abstract
In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in parsing textual data and generating code. However, their performance in tasks involving tabular data, especially those requiring symbolic reasoning, faces challenges due to the structural variance and inconsistency in table cell values often found in web tables. In this paper, we introduce NormTab, a novel framework aimed at enhancing the symbolic reasoning performance of LLMs by normalizing web tables. We study table normalization as a stand-alone, one-time preprocessing step using LLMs to support symbolic reasoning on tabular data. Our experimental evaluation, conducted on challenging web table datasets such as WikiTableQuestion and TabFact, demonstrates that leveraging NormTab significantly improves symbolic reasoning performance, showcasing the importance and effectiveness of web table normalization for enhancing LLM-based symbolic reasoning tasks.
Anthology ID:
2024.findings-emnlp.203
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3569–3585
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.203
DOI:
Bibkey:
Cite (ACL):
Md Nahid and Davood Rafiei. 2024. NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 3569–3585, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization (Nahid & Rafiei, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.203.pdf