Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels

Itsumi Saito; Jun Suzuki; Kyosuke Nishida; Kugatsu Sadamitsu; Satoshi Kobashikawa; Ryo Masumura; Yuji Matsumoto; Junji Tomita

Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels

Itsumi Saito, Jun Suzuki, Kyosuke Nishida, Kugatsu Sadamitsu, Satoshi Kobashikawa, Ryo Masumura, Yuji Matsumoto, Junji Tomita

Abstract

In this study, we investigated the effectiveness of augmented data for encoder-decoder-based neural normalization models. Attention based encoder-decoder models are greatly effective in generating many natural languages. % such as machine translation or machine summarization. In general, we have to prepare for a large amount of training data to train an encoder-decoder model. Unlike machine translation, there are few training data for text-normalization tasks. In this paper, we propose two methods for generating augmented data. The experimental results with Japanese dialect normalization indicate that our methods are effective for an encoder-decoder model and achieve higher BLEU score than that of baselines. We also investigated the oracle performance and revealed that there is sufficient room for improving an encoder-decoder model.

Anthology ID:: I17-2044
Volume:: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:: November
Year:: 2017
Address:: Taipei, Taiwan
Editors:: Greg Kondrak, Taro Watanabe
Venue:: IJCNLP
SIG:
Publisher:: Asian Federation of Natural Language Processing
Note:
Pages:: 257–262
Language:
URL:: https://aclanthology.org/I17-2044/
DOI:
Bibkey:
Cite (ACL):: Itsumi Saito, Jun Suzuki, Kyosuke Nishida, Kugatsu Sadamitsu, Satoshi Kobashikawa, Ryo Masumura, Yuji Matsumoto, and Junji Tomita. 2017. Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 257–262, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):: Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels (Saito et al., IJCNLP 2017)
Copy Citation:
PDF:: https://aclanthology.org/I17-2044.pdf

PDF Cite Search Fix data