Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora

Amir Hazem, Emmanuel Morin


Abstract
Bilingual lexicon extraction from comparable corpora is constrained by the small amount of available data when dealing with specialized domains. This aspect penalizes the performance of distributional-based approaches, which is closely related to the reliability of word’s cooccurrence counts extracted from comparable corpora. A solution to avoid this limitation is to associate external resources with the comparable corpus. Since bilingual word embeddings have recently shown efficient models for learning bilingual distributed representation of words, we explore different word embedding models and show how a general-domain comparable corpus can enrich a specialized comparable corpus via neural networks
Anthology ID:
I17-1069
Volume:
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Editors:
Greg Kondrak, Taro Watanabe
Venue:
IJCNLP
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
685–693
Language:
URL:
https://aclanthology.org/I17-1069/
DOI:
Bibkey:
Cite (ACL):
Amir Hazem and Emmanuel Morin. 2017. Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 685–693, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora (Hazem & Morin, IJCNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/I17-1069.pdf