DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

Hao Wang; Qingxuan Wang; Yue Li; Changqing Wang; Chenhui Chu; Rui Wang

doi:10.18653/v1/2023.findings-emnlp.344

DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

Hao Wang, Qingxuan Wang, Yue Li, Changqing Wang, Chenhui Chu, Rui Wang

Abstract

The use of visually-rich documents in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce DocTrack, a visually-rich document dataset really aligned with human eye-movement information using eye-tracking technology. This dataset can be used to investigate the challenges mentioned above. Additionally, we explore the impact of human reading order on document understanding tasks and examine what would happen if a machine reads in the same order as a human. Our results suggest that although Document AI models have made significant progresses, they still have a long way to go before they can read visually richer documents as accurately, continuously, and flexibly as humans do. These findings have potential implications for future research and development of document intelligence.

Anthology ID:: 2023.findings-emnlp.344
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5176–5189
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.344/
DOI:: 10.18653/v1/2023.findings-emnlp.344
Bibkey:
Cite (ACL):: Hao Wang, Qingxuan Wang, Yue Li, Changqing Wang, Chenhui Chu, and Rui Wang. 2023. DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5176–5189, Singapore. Association for Computational Linguistics.
Cite (Informal):: DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading (Wang et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-emnlp.344.pdf

PDF Cite Search Fix data