An Embarrassingly Simple Method to Mitigate Undesirable Properties of Pretrained Language Model Tokenizers

An Embarrassingly Simple Method to Mitigate Undesirable Properties of Pretrained Language Model Tokenizers Valentin Hofmann author Hinrich Schuetze author Janet Pierrehumbert author 2022-05 text Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) Smaranda Muresan editor Preslav Nakov editor Aline Villavicencio editor Association for Computational Linguistics Dublin, Ireland conference publication hofmann-etal-2022-embarrassingly 10.18653/v1/2022.acl-short.43 https://aclanthology.org/2022.acl-short.43/ 2022-05 385 393