Comparison of the Impact of Word Segmentation on Name Tagging for Chinese and Japanese

Haibo Li, Masato Hagiwara, Qi Li, Heng Ji


Abstract
Word Segmentation is usually considered an essential step for many Chinese and Japanese Natural Language Processing tasks, such as name tagging. This paper presents several new observations and analysis on the impact of word segmentation on name tagging; (1). Due to the limitation of current state-of-the-art Chinese word segmentation performance, a character-based name tagger can outperform its word-based counterparts for Chinese but not for Japanese; (2). It is crucial to keep segmentation settings (e.g. definitions, specifications, methods) consistent between training and testing for name tagging; (3). As long as (2) is ensured, the performance of word segmentation does not have appreciable impact on Chinese and Japanese name tagging.
Anthology ID:
L14-1310
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2532–2536
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/358_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Haibo Li, Masato Hagiwara, Qi Li, and Heng Ji. 2014. Comparison of the Impact of Word Segmentation on Name Tagging for Chinese and Japanese. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2532–2536, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Comparison of the Impact of Word Segmentation on Name Tagging for Chinese and Japanese (Li et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/358_Paper.pdf