LayoutPointer: A Spatial-Context Adaptive Pointer Network for Visual Information Extraction

Huang Siyuan, Yongping Xiong, Wu Guibin


Abstract
Visual Information Extraction (VIE), as a crucial task of Document Intelligence, involves two primary sub-tasks: Semantic Entity Recognition (SER) and Relation Extraction (RE). However, VIE faces two significant challenges. Firstly, most existing models inadequately utilize spatial information of entities, often failing to predict connections or incorrectly linking spatially distant entities. Secondly, the improper input order of tokens challenges in extracting complete entity pairs from documents with multi-line entities when text is extracted via PDF parser or OCR. To address these challenges, we propose LayoutPointer, a Spatial-Context Adaptive Pointer Network. LayoutPointer explicitly enhances spatial-context relationships by incorporating 2D relative position information and adaptive spatial constraints within self-attention. Furthermore, we recast the RE task as a specialized cycle detection problem, employing a unique tail-to-head pointer to restore the semantic order among multi-line entities. To better evaluate the effectiveness of our proposed method, we reconstruct a multi-line dataset named MLFUD, which more accurately reflects real-world scenarios. Fine-tuning experimental results on FUNSD, XFUND, and MLFUD datasets demonstrate that LayoutPointer significantly outperforms existing state-of-the-art methods in F1 scores for RE tasks (e.g., 5.71% improvement on XFUND using LayoutPointerBASE-X over LayoutLMv3).
Anthology ID:
2024.naacl-long.207
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3737–3748
Language:
URL:
https://aclanthology.org/2024.naacl-long.207
DOI:
Bibkey:
Cite (ACL):
Huang Siyuan, Yongping Xiong, and Wu Guibin. 2024. LayoutPointer: A Spatial-Context Adaptive Pointer Network for Visual Information Extraction. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3737–3748, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
LayoutPointer: A Spatial-Context Adaptive Pointer Network for Visual Information Extraction (Siyuan et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.207.pdf
Copyright:
 2024.naacl-long.207.copyright.pdf