Dataset Debt in Biomedical Language Modeling Jason Fries author Natasha Seelam author Gabriel Altay author Leon Weber author Myungsun Kang author Debajyoti Datta author Ruisi Su author Samuele Garda author Bo Wang author Simon Ott author Matthias Samwald author Wojciech Kusa author 2022-05 text Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models Angela Fan editor Suzana Ilic editor Thomas Wolf editor Matthias Gallé editor Association for Computational Linguistics virtual+Dublin conference publication fries-etal-2022-dataset 10.18653/v1/2022.bigscience-1.10 https://aclanthology.org/2022.bigscience-1.10/ 2022-05 137 145