A Simple yet Effective Learnable Positional Encoding Method for Improving Document Transformer Model

Guoxin Wang, Yijuan Lu, Lei Cui, Tengchao Lv, Dinei Florencio, Cha Zhang


Abstract
Positional encoding plays a key role in Transformer-based architecture, which is to indicate and embed token sequential order information. Understanding documents with unreliable reading order information is a real challenge for document Transformer models. This paper proposes a simple and effective positional encoding method, learnable sinusoidal positional encoding (LSPE), by building a learnable sinusoidal positional encoding feed-forward network. We apply LSPE to document Transformer models and pretrain them on document datasets. Then we finetune and evaluate the model performance on document understanding tasks in form, receipt, and invoice domains. Experimental results show our proposed method not only outperforms other baselines, but also demonstrates its robustness and stability on handling noisy data with incorrect order information.
Anthology ID:
2022.findings-aacl.42
Volume:
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
Month:
November
Year:
2022
Address:
Online only
Editors:
Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
453–463
Language:
URL:
https://aclanthology.org/2022.findings-aacl.42
DOI:
Bibkey:
Cite (ACL):
Guoxin Wang, Yijuan Lu, Lei Cui, Tengchao Lv, Dinei Florencio, and Cha Zhang. 2022. A Simple yet Effective Learnable Positional Encoding Method for Improving Document Transformer Model. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 453–463, Online only. Association for Computational Linguistics.
Cite (Informal):
A Simple yet Effective Learnable Positional Encoding Method for Improving Document Transformer Model (Wang et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-aacl.42.pdf