An Ensemble Method for Producing Word Representations focusing on the Greek Language

Michalis Lioudakis, Stamatis Outsios, Michalis Vazirgiannis


Abstract
In this paper we present a new ensemble method, Continuous Bag-of-Skip-grams (CBOS), that produces high-quality word representations putting emphasis on the Greek language. The CBOS method combines the pioneering approaches for learning word representations: Continuous Bag-of-Words (CBOW) and Continuous Skip-gram. These methods are compared through intrinsic and extrinsic evaluation tasks on three different sources of data: the English Wikipedia corpus, the Greek Wikipedia corpus, and the Greek Web Content corpus. By comparing these methods across different tasks and datasets, it is evident that the CBOS method achieves state-of-the-art performance.
Anthology ID:
2020.loresmt-1.13
Volume:
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
Month:
December
Year:
2020
Address:
Suzhou, China
Editors:
Alina Karakanta, Atul Kr. Ojha, Chao-Hong Liu, Jade Abbott, John Ortega, Jonathan Washington, Nathaniel Oco, Surafel Melaku Lakew, Tommi A Pirinen, Valentin Malykh, Varvara Logacheva, Xiaobing Zhao
Venue:
LoResMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
99–107
Language:
URL:
https://aclanthology.org/2020.loresmt-1.13
DOI:
Bibkey:
Cite (ACL):
Michalis Lioudakis, Stamatis Outsios, and Michalis Vazirgiannis. 2020. An Ensemble Method for Producing Word Representations focusing on the Greek Language. In Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, pages 99–107, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
An Ensemble Method for Producing Word Representations focusing on the Greek Language (Lioudakis et al., LoResMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.loresmt-1.13.pdf
Code
 mikeliou/greek_word_embeddings