Iterative Self-Correction for Text-Driven Person Re-Identification with Large Vision-Language Models

Guijin Luo; Zequn Xie; Sihang Cai; Chuxin Wang; Zhou Zhao; Yixuan Tang

Iterative Self-Correction for Text-Driven Person Re-Identification with Large Vision-Language Models

Guijin Luo, Zequn Xie, Sihang Cai, Chuxin Wang, Zhou Zhao, Yixuan Tang

Abstract

Person Re-Identification (ReID) has long struggled with the semantic gap between low-level visual features and high-level identity concepts. While Vision-Language Models (VLMs) offer promising semantic understanding, existing methods typically adopt a static "one-pass" paradigm, converting images to text once for retrieval. This approach suffers from two critical flaws: Information Bottleneck, where converting rich visuals into text causes detail loss, and Open-Loop Failure, where initial hallucinations propagate without recourse. To address this, we propose Auto-ReID, a novel framework that reformulates ReID as an iterative "Think-and-Refine" process. We first introduce a Hierarchical Progressive Tuning strategy to transform a generic VLM into a specialized Re-ID expert. During inference, we deploy a closed-loop architecture comprising a Reasoner for structured attribute extraction, a Hybrid Retriever that anchors dynamic semantic queries with stable visual features to prevent drift, and a Corrector that deconstructs and verifies candidates to iteratively optimize the search. Extensive experiments on ReID datasets demonstrate that our method significantly outperforms state-of-the-art approaches, particularly in complex occlusion scenarios.

Anthology ID:: 2026.findings-acl.312
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6292–6301
Language:
URL:: https://aclanthology.org/2026.findings-acl.312/
DOI:
Bibkey:
Cite (ACL):: Guijin Luo, Zequn Xie, Sihang Cai, Chuxin Wang, Zhou Zhao, and Yixuan Tang. 2026. Iterative Self-Correction for Text-Driven Person Re-Identification with Large Vision-Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 6292–6301, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Iterative Self-Correction for Text-Driven Person Re-Identification with Large Vision-Language Models (Luo et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.312.pdf
Checklist:: 2026.findings-acl.312.checklist.pdf

PDF Cite Search Checklist Fix data