Neural Document Segmentation Using Weighted Sliding Windows with Transformer Encoders

Saeed Abbasi; Aijun An; Heidar Davoudi; Ron Di Carlantonio; Gary Farmaner

Neural Document Segmentation Using Weighted Sliding Windows with Transformer Encoders

Saeed Abbasi, Aijun An, Heidar Davoudi, Ron Di Carlantonio, Gary Farmaner

Abstract

We introduce a novel Transformer-based method for document segmentation, tailored for practical, real-world applications. This method utilizes overlapping text sequences with a unique position-aware weighting mechanism to enhance segmentation accuracy. Through comprehensive experiments on both public and proprietary datasets, we demonstrate significant improvements, establishing new state-of-the-art standards by achieving up to a 10% increase in segmentation F1 score compared to existing methods. Additionally, we explore the application of our segmentation method in downstream retrieval-augmented question answering tasks, where it improves the quality of generated responses by 5% while achieving up to four times greater efficiency. These results underscore our model’s potential as a robust and scalable solution for real-world text segmentation challenges.

Anthology ID:: 2025.coling-industry.67
Volume:: Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 807–816
Language:
URL:: https://aclanthology.org/2025.coling-industry.67/
DOI:
Bibkey:
Cite (ACL):: Saeed Abbasi, Aijun An, Heidar Davoudi, Ron Di Carlantonio, and Gary Farmaner. 2025. Neural Document Segmentation Using Weighted Sliding Windows with Transformer Encoders. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 807–816, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Neural Document Segmentation Using Weighted Sliding Windows with Transformer Encoders (Abbasi et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-industry.67.pdf

PDF Cite Search Fix data