%0 Conference Proceedings %T Optimizing Word Segmentation for Downstream Task %A Hiraoka, Tatsuya %A Takase, Sho %A Uchiumi, Kei %A Keyaki, Atsushi %A Okazaki, Naoaki %Y Cohn, Trevor %Y He, Yulan %Y Liu, Yang %S Findings of the Association for Computational Linguistics: EMNLP 2020 %D 2020 %8 November %I Association for Computational Linguistics %C Online %F hiraoka-etal-2020-optimizing %X In traditional NLP, we tokenize a given sentence as a preprocessing, and thus the tokenization is unrelated to a target downstream task. To address this issue, we propose a novel method to explore a tokenization which is appropriate for the downstream task. Our proposed method, optimizing tokenization (OpTok), is trained to assign a high probability to such appropriate tokenization based on the downstream task loss. OpTok can be used for any downstream task which uses a vector representation of a sentence such as text classification. Experimental results demonstrate that OpTok improves the performance of sentiment analysis and textual entailment. In addition, we introduce OpTok into BERT, the state-of-the-art contextualized embeddings and report a positive effect. %R 10.18653/v1/2020.findings-emnlp.120 %U https://aclanthology.org/2020.findings-emnlp.120 %U https://doi.org/10.18653/v1/2020.findings-emnlp.120 %P 1341-1351