“Don’t worry, it’s just noise”’: quantifying the impact of files treated as single textual units when they are really collections Thibault Clérice author 2021-12 text Proceedings of the Workshop on Natural Language Processing for Digital Humanities Mika Hämäläinen editor Khalid Alnajjar editor Niko Partanen editor Jack Rueter editor NLP Association of India (NLPAI) NIT Silchar, India conference publication clerice-2021-dont https://aclanthology.org/2021.nlp4dh-1.11/ 2021-12 95 105