PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks

Yunuo Liu; Dawei Zhu; Zena Al-Khalili; Dai Cheng; Yanjun Chen; Dietrich Klakow; Wei Zhang; Xiaoyu Shen

doi:10.18653/v1/2025.emnlp-main.393

PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks

Yunuo Liu, Dawei Zhu, Zena Al-Khalili, Dai Cheng, Yanjun Chen, Dietrich Klakow, Wei Zhang, Xiaoyu Shen

Abstract

We present PricingLogic, the first benchmarkthat probes whether Large Language Mod-els (LLMs) can reliably automate tourism-booking prices when multiple, overlapping farerules apply. Travel agencies are eager to of-fload this error-prone task to AI systems; how-ever, deploying LLMs without verified reliabil-ity could result in significant financial lossesand erode customer trust. PricingLogic com-prises 300 natural-language questions based onbooking requests derived from 42 real-worldpricing policies, spanning two levels of diffi-culty: (i) basic customer-type pricing and (ii)bundled-tour calculations involving interactingdiscounts. Evaluations of a line of LLMs re-veal a steep performance drop on the harder tier,exposing systematic failures in rule interpreta-tion and arithmetic reasoning. These resultshighlight that, despite their general capabilities,today’s LLMs remain unreliable for revenue-critical applications without further safeguardsor domain adaptation. Our code and dataset areavaliable in https://github.com/EIT-NLP/PricingLogic.

Anthology ID:: 2025.emnlp-main.393
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7725–7734
Language:
URL:: https://aclanthology.org/2025.emnlp-main.393/
DOI:: 10.18653/v1/2025.emnlp-main.393
Bibkey:
Cite (ACL):: Yunuo Liu, Dawei Zhu, Zena Al-Khalili, Dai Cheng, Yanjun Chen, Dietrich Klakow, Wei Zhang, and Xiaoyu Shen. 2025. PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7725–7734, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks (Liu et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.393.pdf
Checklist:: 2025.emnlp-main.393.checklist.pdf

PDF Cite Search Checklist Fix data