Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation

Chenhui Chu, Sadao Kurohashi


Abstract
Out-of-vocabulary (OOV) word is a crucial problem in statistical machine translation (SMT) with low resources. OOV paraphrasing that augments the translation model for the OOV words by using the translation knowledge of their paraphrases has been proposed to address the OOV problem. In this paper, we propose using word embeddings and semantic lexicons for OOV paraphrasing. Experiments conducted on a low resource setting of the OLYMPICS task of IWSLT 2012 verify the effectiveness of our proposed method.
Anthology ID:
L16-1101
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
644–648
Language:
URL:
https://aclanthology.org/L16-1101/
DOI:
Bibkey:
Cite (ACL):
Chenhui Chu and Sadao Kurohashi. 2016. Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 644–648, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation (Chu & Kurohashi, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1101.pdf
Data
FrameNet