Siheng Xiong


2024

pdf bib
Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation
Yuan Yang | Siheng Xiong | Ali Payani | Ehsan Shareghi | Faramarz Fekri
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Advancements in logical reasoning, utilizing LLMs to convert natural language into logical symbolism, combined with the use of external theorem provers, have repositioned the symbolic approach as a central point of interest. The main challenge within this paradigm lies in the LLMs’ capability to accurately translate natural language (NL) statements into first-order-logic (FOL) expressions. Although LLMs have shown notable success, there remains a gap in understanding the limitations and challenges they encounter in NL-FOL translation. This is primarily due to the absence of datasets and evaluation test beds at the required fine-grained level. We present MALLS, a dataset of 28K diverse and verified sentence-level NL-FOL pairs collected from GPT4. We utilize a combined strategy of FOL rule parsing, human annotation, and automatic filtering to ensure quality. We also present LogicLLaMA, a LLaMA2-7B/13B fine-tuned on MALLS for NL-FOL translation, which can be used standalone or to correct previously generated rules by GPT3.5 after being further fine-tuned via a novel reinforcement learning with human feedback (RLHF) framework. We benchmark a wide range of LLMs on MALLS and previous datasets, highlighting weaknesses in them in NL-FOL translation and demonstrating the advantages of MALLS. We also show that LogicLLaMA achieves GPT4-level performance and can generalize to other datasets. Project repo is available at https://github.com/gblackout/LogicLLaMA

pdf bib
Large Language Models Can Learn Temporal Reasoning
Siheng Xiong | Ali Payani | Ramana Kompella | Faramarz Fekri
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While large language models (LLMs) have demonstrated remarkable reasoning capabilities, they are not without their flaws and inaccuracies. Recent studies have introduced various methods to mitigate these limitations. Temporal reasoning (TR), in particular, presents a significant challenge for LLMs due to its reliance on diverse temporal concepts and intricate temporal logic. In this paper, we propose TG-LLM, a novel framework towards language-based TR. Instead of reasoning over the original context, we adopt a latent representation, temporal graph (TG) that enhances the learning of TR. A synthetic dataset (TGQA), which is fully controllable and requires minimal supervision, is constructed for fine-tuning LLMs on this text-to-TG translation task. We confirmed in experiments that the capability of TG translation learned on our dataset can be transferred to other TR tasks and benchmarks. On top of that, we teach LLM to perform deliberate reasoning over the TGs via Chain-of-Thought (CoT) bootstrapping and graph data augmentation. We observed that those strategies, which maintain a balance between usefulness and diversity, bring more reliable CoTs and final results than the vanilla CoT distillation.