Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets Julia Kreutzer author Isaac Caswell author Lisa Wang author Ahsan Wahab author Daan van Esch author Nasanbayar Ulzii-Orshikh author Allahsera Tapo author Nishant Subramani author Artem Sokolov author Claytone Sikasote author Monang Setyawan author Supheakmungkol Sarin author Sokhar Samb author Benoît Sagot author Clara Rivera author Annette Rios author Isabel Papadimitriou author Salomey Osei author Pedro Ortiz Suarez author Iroro Orife author Kelechi Ogueji author Andre Niyongabo Rubungo author Toan Q Nguyen author Mathias Müller author André Müller author Shamsuddeen Hassan Muhammad author Nanda Muhammad author Ayanda Mnyakeni author Jamshidbek Mirzakhalov author Tapiwanashe Matangira author Colin Leong author Nze Lawson author Sneha Kudugunta author Yacine Jernite author Mathias Jenny author Orhan Firat author Bonaventure F P Dossou author Sakhile Dlamini author Nisansa de Silva author Sakine Çabuk Ballı author Stella Biderman author Alessia Battisti author Ahmed Baruwa author Ankur Bapna author Pallavi Baljekar author Israel Abebe Azime author Ayodele Awokoya author Duygu Ataman author Orevaoghene Ahia author Oghenefego Ahia author Sweta Agrawal author Mofetoluwa Adeyemi author 2022 text journal article Transactions of the Association for Computational Linguistics continuing MIT Press Cambridge, MA periodical academic journal kreutzer-etal-2022-quality 10.1162/tacl_a_00447 https://aclanthology.org/2022.tacl-1.4/ 2022 10 50 72