A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization

Shohei Higashiyama, Masao Utiyama, Taro Watanabe, Eiichiro Sumita


Abstract
Lexical normalization, in addition to word segmentation and part-of-speech tagging, is a fundamental task for Japanese user-generated text processing. In this paper, we propose a text editing model to solve the three task jointly and methods of pseudo-labeled data generation to overcome the problem of data deficiency. Our experiments showed that the proposed model achieved better normalization performance when trained on more diverse pseudo-labeled data.
Anthology ID:
2021.wnut-1.9
Volume:
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
Month:
November
Year:
2021
Address:
Online
Venues:
EMNLP | WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
67–80
Language:
URL:
https://aclanthology.org/2021.wnut-1.9
DOI:
10.18653/v1/2021.wnut-1.9
Bibkey:
Cite (ACL):
Shohei Higashiyama, Masao Utiyama, Taro Watanabe, and Eiichiro Sumita. 2021. A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pages 67–80, Online. Association for Computational Linguistics.
Cite (Informal):
A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization (Higashiyama et al., WNUT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wnut-1.9.pdf