Deduplicating Training Data Makes Language Models Better Katherine Lee author Daphne Ippolito author Andrew Nystrom author Chiyuan Zhang author Douglas Eck author Chris Callison-Burch author Nicholas Carlini author 2022-05 text Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Smaranda Muresan editor Preslav Nakov editor Aline Villavicencio editor Association for Computational Linguistics Dublin, Ireland conference publication lee-etal-2022-deduplicating 10.18653/v1/2022.acl-long.577 https://aclanthology.org/2022.acl-long.577/ 2022-05 8424 8445