Beyond Layout Embedding: Layout Attention with Gaussian Biases for Structured Document Understanding

Xi Zhu, Xue Han, Shuyuan Peng, Shuo Lei, Chao Deng, Junlan Feng


Abstract
Effectively encoding layout information is a central problem in structured document understanding. Most existing methods rely heavily on millions of trainable parameters to learn the layout features of each word from Cartesian coordinates. However, two unresolved questions remain: (1) Is the Cartesian coordinate system the optimal choice for layout modeling? (2) Are massive learnable parameters truly necessary for layout representation? In this paper, we address these questions by proposing Layout Attention with Gaussian Biases (LAGaBi): Firstly, we find that polar coordinates provide a superior choice over Cartesian coordinates as they offer a measurement of both distance and angle between word pairs, capturing relative positions more effectively. Furthermore, by feeding the distances and angles into 2-D Gaussian kernels, we model intuitive inductive layout biases, i.e., the words closer within a document should receive more attention, which will act as the attention biases to revise the textual attention distribution. LAGaBi is model-agnostic and language-independent, which can be applied to a range of transformer-based models, such as the text pre-training models from the BERT series and the LayoutLM series that incorporate visual features. Experimental results on three widely used benchmarks demonstrate that, despite reducing the number of layout parameters from millions to 48, LAGaBi achieves competitive or even superior performance.
Anthology ID:
2023.findings-emnlp.521
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7773–7784
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.521
DOI:
10.18653/v1/2023.findings-emnlp.521
Bibkey:
Cite (ACL):
Xi Zhu, Xue Han, Shuyuan Peng, Shuo Lei, Chao Deng, and Junlan Feng. 2023. Beyond Layout Embedding: Layout Attention with Gaussian Biases for Structured Document Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7773–7784, Singapore. Association for Computational Linguistics.
Cite (Informal):
Beyond Layout Embedding: Layout Attention with Gaussian Biases for Structured Document Understanding (Zhu et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.521.pdf