Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

Siwen Luo, Yihao Ding, Siqu Long, Josiah Poon, Soyeon Caren Han


Abstract
Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on visual cues to understand documents while ignoring other information, such as contextual information or the relationships between document layout components, which are vital to boost better layout analysis performance. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We construct different graphs to capture the four main features aspects of document layout components, including syntactic, semantic, density, and appearance features. Then, we apply graph convolutional networks to enhance each aspect of features and apply the node-level pooling for integration. Finally, we concatenate features of all aspects and feed them into the 2-layer MLPs for document layout component classification. Our Doc-GCN achieves state-of-the-art results on three widely used DLA datasets: PubLayNet, FUNSD, and DocBank. The code will be released at https://github.com/adlnlp/doc_gcn
Anthology ID:
2022.coling-1.256
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2906–2916
Language:
URL:
https://aclanthology.org/2022.coling-1.256
DOI:
Bibkey:
Cite (ACL):
Siwen Luo, Yihao Ding, Siqu Long, Josiah Poon, and Soyeon Caren Han. 2022. Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2906–2916, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis (Luo et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.256.pdf
Code
 adlnlp/doc_gcn
Data
DocBankFUNSDPubLayNet