Learning Indonesian-Chinese Lexicon with Bilingual Word Embedding Models and Monolingual Signals
Xinying Qiu | Gangqin Zhu
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)
We present a research on learning Indonesian-Chinese bilingual lexicon using monolingual word embedding and bilingual seed lexicons to build shared bilingual word embedding space. We take the first attempt to examine the impact of different monolingual signals for the choice of seed lexicons on the model performance. We found that although monolingual signals alone do not seem to outperform signals coverings all words, the significant improvement for learning word translation of the same signal types may suggest that linguistic features possess value for further study in distinguishing the semantic margins of the shared word embedding space.