PPTSER: A Plug-and-Play Tag-guided Method for Few-shot Semantic Entity Recognition on Visually-rich Documents

Wenhui Liao; Jiapeng Wang; Zening Lin; Longfei Xiong; Lianwen Jin

doi:10.18653/v1/2024.findings-acl.626

PPTSER: A Plug-and-Play Tag-guided Method for Few-shot Semantic Entity Recognition on Visually-rich Documents

Wenhui Liao, Jiapeng Wang, Zening Lin, Longfei Xiong, Lianwen Jin

Abstract

Visually-rich document information extraction (VIE) is a vital aspect of document understanding, wherein Semantic Entity Recognition (SER) plays a significant role. However, few-shot SER on visually-rich documents remains relatively unexplored despite its considerable potential for practical applications. To address this issue, we propose a simple yet effective Plug-and-Play Tag-guided method for few-shot Semantic Entity Recognition (PPTSER) on visually-rich documents. PPTSER is built upon off-the-shelf multi-modal pre-trained models. It leverages the semantics of the tags to guide the SER task, reformulating SER into entity typing and span detection, handling both tasks simultaneously via cross-attention. Experimental results illustrate that PPTSER outperforms existing fine-tuning and few-shot methods, especially in low-data regimes. With full training data, PPTSER achieves comparable or superior performance to fine-tuning baseline. For instance, on the FUNSD benchmark, our method improves the performance of LayoutLMv3-base in 1-shot, 3-shot and 5-shot scenarios by 15.61%, 2.13%, and 2.01%, respectively. Overall, PPTSER demonstrates promising generalizability, effectiveness, and plug-and-play nature for few-shot SER on visually-rich documents. The codes will be available at [https://github.com/whlscut/PPTSER](https://github.com/whlscut/PPTSER).

Anthology ID:: 2024.findings-acl.626
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10522–10539
Language:
URL:: https://aclanthology.org/2024.findings-acl.626/
DOI:: 10.18653/v1/2024.findings-acl.626
Bibkey:
Cite (ACL):: Wenhui Liao, Jiapeng Wang, Zening Lin, Longfei Xiong, and Lianwen Jin. 2024. PPTSER: A Plug-and-Play Tag-guided Method for Few-shot Semantic Entity Recognition on Visually-rich Documents. In Findings of the Association for Computational Linguistics: ACL 2024, pages 10522–10539, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: PPTSER: A Plug-and-Play Tag-guided Method for Few-shot Semantic Entity Recognition on Visually-rich Documents (Liao et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.626.pdf

PDF Cite Search Fix data