Text2World: Benchmarking Large Language Models for Symbolic World Model Generation

Mengkang Hu; Tianxing Chen; Yude Zou; Yuheng Lei; Qiguang Chen (陈麒光); Ming Li; Yao Mu; Hongyuan Zhang; Wenqi Shao; Ping Luo

doi:10.18653/v1/2025.findings-acl.1337

Text2World: Benchmarking Large Language Models for Symbolic World Model Generation

Mengkang Hu, Tianxing Chen, Yude Zou, Yuheng Lei, Qiguang Chen, Ming Li, Yao Mu, Hongyuan Zhang, Wenqi Shao, Ping Luo

Abstract

Recently, there has been growing interest in leveraging large language models (LLMs) to generate symbolic world models from textual descriptions. Although LLMs have been extensively explored in the context of world modeling, prior studies encountered several challenges, including evaluation randomness, dependence on indirect metrics, and a limited domain scope. To address these limitations, we introduce a novel benchmark, Text2World, based on planning domain definition language (PDDL), featuring hundreds of diverse domains and employing multi-criteria, execution-based metrics for a more robust evaluation. We benchmark current LLMs using Text2World and find that reasoning models trained with large-scale reinforcement learning outperform others. However, even the best-performing model still demonstrates limited capabilities in world modeling. Building on these insights, we examine several promising strategies to enhance the world modeling capabilities of LLMs, including test-time scaling, agent training, and more. We hope that Text2World can serve as a crucial resource, laying the groundwork for future research in leveraging LLMs as world models.

Anthology ID:: 2025.findings-acl.1337
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26043–26066
Language:
URL:: https://aclanthology.org/2025.findings-acl.1337/
DOI:: 10.18653/v1/2025.findings-acl.1337
Bibkey:
Cite (ACL):: Mengkang Hu, Tianxing Chen, Yude Zou, Yuheng Lei, Qiguang Chen, Ming Li, Yao Mu, Hongyuan Zhang, Wenqi Shao, and Ping Luo. 2025. Text2World: Benchmarking Large Language Models for Symbolic World Model Generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 26043–26066, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Text2World: Benchmarking Large Language Models for Symbolic World Model Generation (Hu et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.1337.pdf

PDF Cite Search Fix data