Multi-prototype Chinese Character Embedding

Yanan Lu, Yue Zhang, Donghong Ji


Abstract
Chinese sentences are written as sequences of characters, which are elementary units of syntax and semantics. Characters are highly polysemous in forming words. We present a position-sensitive skip-gram model to learn multi-prototype Chinese character embeddings, and explore the usefulness of such character embeddings to Chinese NLP tasks. Evaluation on character similarity shows that multi-prototype embeddings are significantly better than a single-prototype baseline. In addition, used as features in the Chinese NER task, the embeddings result in a 1.74% F-score improvement over a state-of-the-art baseline.
Anthology ID:
L16-1138
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
855–859
Language:
URL:
https://aclanthology.org/L16-1138/
DOI:
Bibkey:
Cite (ACL):
Yanan Lu, Yue Zhang, and Donghong Ji. 2016. Multi-prototype Chinese Character Embedding. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 855–859, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Multi-prototype Chinese Character Embedding (Lu et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1138.pdf