What’s in the Box? An Analysis of Undesirable Content in the Common Crawl Corpus Alexandra Luccioni author Joseph Viviano author 2021-08 text Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) Chengqing Zong editor Fei Xia editor Wenjie Li editor Roberto Navigli editor Association for Computational Linguistics Online conference publication luccioni-viviano-2021-whats 10.18653/v1/2021.acl-short.24 https://aclanthology.org/2021.acl-short.24/ 2021-08 182 189