Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages Tomasz Limisiewicz author Jiří Balhar author David Mareček author 2023-07 text Findings of the Association for Computational Linguistics: ACL 2023 Anna Rogers editor Jordan Boyd-Graber editor Naoaki Okazaki editor Association for Computational Linguistics Toronto, Canada conference publication limisiewicz-etal-2023-tokenization 10.18653/v1/2023.findings-acl.350 https://aclanthology.org/2023.findings-acl.350/ 2023-07 5661 5681