Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning

Qingyu Tan, Hwee Tou Ng, Lidong Bing


Abstract
Knowledge in the real world is being updated constantly. However, it is costly to frequently update large language models (LLMs). Therefore, it is crucial for LLMs to understand the concept of temporal knowledge. However, prior works on temporal question answering (TQA) did not emphasize multi-answer and multi-hop types of temporal reasoning. In this paper, we propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning. Besides, we also propose a novel data augmentation strategy to improve the complex temporal reasoning capability and robustness of LLMs. We conducted experiments on multiple temporal QA datasets. Experimental results show that our method is able to improve LLMs’ performance on temporal QA benchmarks by significant margins.
Anthology ID:
2024.findings-acl.374
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6272–6286
Language:
URL:
https://aclanthology.org/2024.findings-acl.374
DOI:
Bibkey:
Cite (ACL):
Qingyu Tan, Hwee Tou Ng, and Lidong Bing. 2024. Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning. In Findings of the Association for Computational Linguistics ACL 2024, pages 6272–6286, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning (Tan et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.374.pdf