ATLAS: A System for PDF-centric Human Interaction Data Collection

Alexa Siu, Zichao Wang, Joshua Hoeflich, Naman Kapasi, Ani Nenkova, Tong Sun


Abstract
The Portable Document Format (PDF) is a popular format for distributing digital documents. Datasets on PDF reading behaviors and interactions remain limited due to the challenges of instrumenting PDF readers for these data collection tasks. We present ATLAS, a data collection tool designed to better support researchers in collecting rich PDF-centric datasets from users. ATLAS supports researchers in programmatically creating a user interface for data collection that is ready to share with annotators. It includes a toolkit and an extensible schema to easily customize the data collection tasks for a variety of purposes, allowing collection of PDF annotations (e.g., highlights, drawings) as well as reading behavior analytics (e.g., page scroll, text selections). We open-source ATLAS1 to support future research efforts and review use cases of ATLAS that showcase our system’s broad applicability.
Anthology ID:
2024.naacl-demo.9
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kai-Wei Chang, Annie Lee, Nazneen Rajani
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
87–96
Language:
URL:
https://aclanthology.org/2024.naacl-demo.9
DOI:
10.18653/v1/2024.naacl-demo.9
Bibkey:
Cite (ACL):
Alexa Siu, Zichao Wang, Joshua Hoeflich, Naman Kapasi, Ani Nenkova, and Tong Sun. 2024. ATLAS: A System for PDF-centric Human Interaction Data Collection. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations), pages 87–96, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
ATLAS: A System for PDF-centric Human Interaction Data Collection (Siu et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-demo.9.pdf