P²Net: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts

Kaiwen Wei; Jie Yao; Jiang Zhong; Yangyang Kang; Jingyuan Zhang; Changlong Sun; Xin Zhang; Fengmao Lv; Li Jin

doi:10.18653/v1/2025.findings-acl.552

P²Net: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts

Kaiwen Wei, Jie Yao, Jiang Zhong, Yangyang Kang, Jingyuan Zhang, Changlong Sun, Xin Zhang, Fengmao Lv, Li Jin

Abstract

Key Information Extraction (KIE) is a challenging multimodal task aimed at extracting structured value entities from visually rich documents. Despite recent advancements, two major challenges remain. First, existing datasets typically feature fixed layouts and a limited set of entity categories, while current methods are based on a full-shot setting that is difficult to apply in real-world scenarios, where new entity categories frequently emerge. Secondly, current methods often treat key entities simply as parts of the OCR-parsed context, neglecting the positive impact of the relationships between key-value entities. To address the first challenge, we introduce a new large-scale, human-annotated dataset, Complex Layout document for Key Information Extraction (CLEX). Comprising 5,860 images with 1,162 entity categories, CLEX is larger and more complex than existing datasets. It also primarily focuses on the zero-shot and few-shot KIE tasks, which are more aligned with real-world applications. To tackle the second challenge, we propose the Parallel Pointer-based Network (P²Net). This model frames KIE as a pointer-based classification task and effectively leverages implicit relationships between key-value entities to enhance extraction. Its parallel extraction mechanism enables simultaneous and efficient extraction of multiple results. Experiments on widely-used datasets, including SROIE, CORD, and the newly introduced CLEX, demonstrate that P²Net outperforms existing state-of-the-art methods (including GPT-4V) while maintaining fast inference speeds.

Anthology ID:: 2025.findings-acl.552
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10611–10626
Language:
URL:: https://aclanthology.org/2025.findings-acl.552/
DOI:: 10.18653/v1/2025.findings-acl.552
Bibkey:
Cite (ACL):: Kaiwen Wei, Jie Yao, Jiang Zhong, Yangyang Kang, Jingyuan Zhang, Changlong Sun, Xin Zhang, Fengmao Lv, and Li Jin. 2025. P²Net: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts. In Findings of the Association for Computational Linguistics: ACL 2025, pages 10611–10626, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: P²Net: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts (Wei et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.552.pdf

PDF Cite Search Fix data