Learning Indonesian-Chinese Lexicon with Bilingual Word Embedding Models and Monolingual Signals

Xinying Qiu, Gangqin Zhu


Abstract
We present a research on learning Indonesian-Chinese bilingual lexicon using monolingual word embedding and bilingual seed lexicons to build shared bilingual word embedding space. We take the first attempt to examine the impact of different monolingual signals for the choice of seed lexicons on the model performance. We found that although monolingual signals alone do not seem to outperform signals coverings all words, the significant improvement for learning word translation of the same signal types may suggest that linguistic features possess value for further study in distinguishing the semantic margins of the shared word embedding space.
Anthology ID:
W16-3720
Volume:
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Dekai Wu, Pushpak Bhattacharyya
Venue:
WSSANLP
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
188–193
Language:
URL:
https://aclanthology.org/W16-3720
DOI:
Bibkey:
Cite (ACL):
Xinying Qiu and Gangqin Zhu. 2016. Learning Indonesian-Chinese Lexicon with Bilingual Word Embedding Models and Monolingual Signals. In Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016), pages 188–193, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Learning Indonesian-Chinese Lexicon with Bilingual Word Embedding Models and Monolingual Signals (Qiu & Zhu, WSSANLP 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-3720.pdf