Lightweight Contextual Logical Structure Recovery
Po-Wei Huang | Abhinav Ramesh Kashyap | Yanxia Qin | Yajing Yang | Min-Yen Kan
Proceedings of the Third Workshop on Scholarly Document Processing
Logical structure recovery in scientific articles associates text with a semantic section of the article. Although previous work has disregarded the surrounding context of a line, we model this important information by employing line-level attention on top of a transformer-based scientific document processing pipeline. With the addition of loss function engineering and data augmentation techniques with semi-supervised learning, our method improves classification performance by 10% compared to a recent state-of-the-art model. Our parsimonious, text-only method achieves a performance comparable to that of other works that use rich document features such as font and spatial position, using less data without sacrificing performance, resulting in a lightweight training pipeline.