ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training

Zhouqiang Jiang, Bowen Wang, Junhao Chen, Yuta Nakashima


Abstract
Recent approaches for visually-rich document understanding (VrDU) uses manually annotated semantic groups, where a semantic group encompasses all semantically relevant but not obviously grouped words. As OCR tools are unable to automatically identify such grouping, we argue that current VrDU approaches are unrealistic. We thus introduce a new variant of the VrDU task, real-world visually-rich document understanding (ReVrDU), that does not allow for using manually annotated semantic groups. We also propose a new method, ReLayout, compliant with the ReVrDU scenario, which learns to capture semantic grouping through arranging words and bringing the representations of words that belong to the potential same semantic group closer together. Our experimental results demonstrate the performance of existing methods is deteriorated with the ReVrDU task, while ReLayout shows superiour performance.
Anthology ID:
2025.coling-main.255
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3778–3793
Language:
URL:
https://aclanthology.org/2025.coling-main.255/
DOI:
Bibkey:
Cite (ACL):
Zhouqiang Jiang, Bowen Wang, Junhao Chen, and Yuta Nakashima. 2025. ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training. In Proceedings of the 31st International Conference on Computational Linguistics, pages 3778–3793, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training (Jiang et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.255.pdf