PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Hammad Ayyubi; Xuande Feng; Junzhang Liu; Xudong Lin; Zhecan Wang; Shih-Fu Chang

doi:10.18653/v1/2025.findings-naacl.111

PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Hammad Ayyubi, Xuande Feng, Junzhang Liu, Xudong Lin, Zhecan Wang, Shih-Fu Chang

Abstract

The task of predicting time and location from images is challenging and requires complex human-like puzzle-solving ability over different clues. In this work, we formalize this ability into core skills and implement them using different modules in an expert pipeline called PuzzleGPT. PuzzleGPT consists of a perceiver to identify visual clues, a reasoner to deduce prediction candidates, a combiner to combinatorially combine information from different clues, a web retriever to get external knowledge if the task can’t be solved locally, and a noise filter for robustness. This results in a zero-shot, interpretable, and robust approach that records state-of-the-art performance on two datasets – TARA and WikiTilo. PuzzleGPT outperforms large VLMs such as BLIP-2, InstructBLIP, LLaVA, and even GPT-4V, as well as automatically generated reasoning pipelines like VisProg, by at least 32% and 38%, respectively. It even rivals or surpasses finetuned models.

Anthology ID:: 2025.findings-naacl.111
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2099–2116
Language:
URL:: https://aclanthology.org/2025.findings-naacl.111/
DOI:: 10.18653/v1/2025.findings-naacl.111
Bibkey:
Cite (ACL):: Hammad Ayyubi, Xuande Feng, Junzhang Liu, Xudong Lin, Zhecan Wang, and Shih-Fu Chang. 2025. PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 2099–2116, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction (Ayyubi et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.111.pdf

PDF Cite Search Fix data