Chinese sentences are written as sequences of characters, which are elementary units of syntax and semantics. Characters are highly polysemous in forming words. We present a position-sensitive skip-gram model to learn multi-prototype Chinese character embeddings, and explore the usefulness of such character embeddings to Chinese NLP tasks. Evaluation on character similarity shows that multi-prototype embeddings are significantly better than a single-prototype baseline. In addition, used as features in the Chinese NER task, the embeddings result in a 1.74% F-score improvement over a state-of-the-art baseline.
Syntactic Dependencies and Distributed Word Representations for Analogy Detection and Mining
Likun Qiu | Yue Zhang | Yanan Lu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing