DataLab: A Platform for Data Analysis and Intervention

Yang Xiao, Jinlan Fu, Weizhe Yuan, Vijay Viswanathan, Zhoumianze Liu, Yixin Liu, Graham Neubig, Pengfei Liu


Abstract
Despite data’s crucial role in machine learning, most existing tools and research tend to focus on systems on top of existing data rather than how to interpret and manipulate data. In this paper, we propose DataLab, a unified data-oriented platform that not only allows users to interactively analyze the characteristics of data but also provides a standardized interface so that many data processing operations can be provided within a unified interface. Additionally, in view of the ongoing surge in the proliferation of datasets, DataLab has features for dataset recommendation and global vision analysis that help researchers form a better view of the data ecosystem. So far, DataLab covers 1,300 datasets and 3,583 of its transformed version, where 313 datasets support different types of analysis (e.g., with respect to gender bias) with the help of 119M samples annotated by 318 feature functions. DataLab is under active development and will be supported going forward. We have released a web platform, web API, Python SDK, and PyPI published package, which hopefully, can meet the diverse needs of researchers.
Anthology ID:
2022.acl-demo.18
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Valerio Basile, Zornitsa Kozareva, Sanja Stajner
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
182–195
Language:
URL:
https://aclanthology.org/2022.acl-demo.18
DOI:
10.18653/v1/2022.acl-demo.18
Bibkey:
Cite (ACL):
Yang Xiao, Jinlan Fu, Weizhe Yuan, Vijay Viswanathan, Zhoumianze Liu, Yixin Liu, Graham Neubig, and Pengfei Liu. 2022. DataLab: A Platform for Data Analysis and Intervention. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 182–195, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
DataLab: A Platform for Data Analysis and Intervention (Xiao et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-demo.18.pdf
Video:
 https://aclanthology.org/2022.acl-demo.18.mp4
Data
BeerAdvocateSNLI