SubmissionNumber#=%=#4 FinalPaperTitle#=%=#An Empirical Study of Multilingual Vocabulary for Neural Machine Translation Models ShortPaperTitle#=%=# NumberOfPages#=%=#14 CopyrightSigned#=%=#Kenji Imamura JobTitle#==# Organization#==#National Institute of Information and Communications Technology 3-5 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan Abstract#==#In this paper, we discuss multilingual vocabulary for neural machine translation models. Multilingual vocabularies should generate highly accurate machine translations regardless of the languages, and have preferences so that tokenized strings contain rare out-of-vocabulary (OOV) tokens and token sequences are short. In this paper, we discuss the characteristics of various multilingual vocabularies via tokenization and translation experiments. We also present our recommended vocabulary and tokenizer. Author{1}{Firstname}#=%=#Kenji Author{1}{Lastname}#=%=#Imamura Author{1}{Username}#=%=#kimamura Author{1}{Email}#=%=#kenji.imamura@nict.go.jp Author{1}{Affiliation}#=%=#National Institute of Information and Communications Technology Author{2}{Firstname}#=%=#Masao Author{2}{Lastname}#=%=#Utiyama Author{2}{Username}#=%=#mutiyama Author{2}{Email}#=%=#mutiyama@nict.go.jp Author{2}{Affiliation}#=%=#NICT ========== èéáğö