Wenlu Zhang

2025

Atlas: Customizing Large Language Models for Reliable Bibliographic Retrieval and Verification
Akash Kodali | Hailu Xu | Wenlu Zhang | Xin Qin
Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications

Large Language Models (LLMs) are increasingly used for citation retrieval, yet their bibliographic outputs often contain hallucinated or inconsistent metadata. This paper examines whether structured prompting improves citation reliability compared with traditional API-based retrieval. We implement a three-stage BibTeX-fetching pipeline: a baseline Crossref resolver, a standard GPT prompting method, and a customized verification-guided GPT configuration. Across heterogeneous reference inputs, we evaluate retrieval coverage, field completeness, and metadata accuracy against Crossref ground truth. Results show that prompting improves coverage and completeness. Our findings highlight the importance of prompt design for building reliable, LLM-driven bibliographic retrieval systems.

Co-authors

Venues

WASP1
WS1

Fix author