Learning to Respond to Mixed-code Queries using Bilingual Word Embeddings

Chia-Fang Ho, Jason Chang, Jhih-Jie Chen, Chingyu Yang


Abstract
We present a method for learning bilingual word embeddings in order to support second language (L2) learners in finding recurring phrases and example sentences that match mixed-code queries (e.g., “接 受 sentence”) composed of words in both target language and native language (L1). In our approach, mixed-code queries are transformed into target language queries aimed at maximizing the probability of retrieving relevant target language phrases and sentences. The method involves converting a given parallel corpus into mixed-code data, generating word embeddings from mixed-code data, and expanding queries in target languages based on bilingual word embeddings. We present a prototype search engine, x.Linggle, that applies the method to a linguistic search engine for a parallel corpus. Preliminary evaluation on a list of common word-translation shows that the method performs reasonablly well.
Anthology ID:
N19-4005
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Waleed Ammar, Annie Louis, Nasrin Mostafazadeh
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24–28
Language:
URL:
https://aclanthology.org/N19-4005
DOI:
10.18653/v1/N19-4005
Bibkey:
Cite (ACL):
Chia-Fang Ho, Jason Chang, Jhih-Jie Chen, and Chingyu Yang. 2019. Learning to Respond to Mixed-code Queries using Bilingual Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 24–28, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Learning to Respond to Mixed-code Queries using Bilingual Word Embeddings (Ho et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-4005.pdf