DocumentNet: Bridging the Data Gap in Document Pre-training Lijun Yu author Jin Miao author Xiaoyu Sun author Jiayi Chen author Alexander Hauptmann author Hanjun Dai author Wei Wei author 2023-12 text Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track Mingxuan Wang editor Imed Zitouni editor Association for Computational Linguistics Singapore conference publication yu-etal-2023-documentnet 10.18653/v1/2023.emnlp-industry.66 https://aclanthology.org/2023.emnlp-industry.66/ 2023-12 707 722