Using Word Embeddings to Translate Named Entities

Octavia-Maria Şulea, Sergiu Nisioi, Liviu P. Dinu


Abstract
In this paper we investigate the usefulness of neural word embeddings in the process of translating Named Entities (NEs) from a resource-rich language to a language low on resources relevant to the task at hand, introducing a novel, yet simple way of obtaining bilingual word vectors. Inspired by observations in (Mikolov et al., 2013b), which show that training their word vector model on comparable corpora yields comparable vector space representations of those corpora, reducing the problem of translating words to finding a rotation matrix, and results in (Zou et al., 2013), which showed that bilingual word embeddings can improve Chinese Named Entity Recognition (NER) and English to Chinese phrase translation, we use the sentence-aligned English-French EuroParl corpora and show that word embeddings extracted from a merged corpus (corpus resulted from the merger of the two aligned corpora) can be used to NE translation. We extrapolate that word embeddings trained on merged parallel corpora are useful in Named Entity Recognition and Translation tasks for resource-poor languages.
Anthology ID:
L16-1536
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3362–3366
Language:
URL:
https://aclanthology.org/L16-1536
DOI:
Bibkey:
Cite (ACL):
Octavia-Maria Şulea, Sergiu Nisioi, and Liviu P. Dinu. 2016. Using Word Embeddings to Translate Named Entities. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3362–3366, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Using Word Embeddings to Translate Named Entities (Şulea et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1536.pdf