Corrected CBOW Performs as well as Skip-gram

Ozan İrsoy, Adrian Benton, Karl Stratos


Abstract
Mikolov et al. (2013a) observed that continuous bag-of-words (CBOW) word embeddings tend to underperform Skip-gram (SG) embeddings, and this finding has been reported in subsequent works. We find that these observations are driven not by fundamental differences in their training objectives, but more likely on faulty negative sampling CBOW implementations in popular libraries such as the official implementation, word2vec.c, and Gensim. We show that after correcting a bug in the CBOW gradient update, one can learn CBOW word embeddings that are fully competitive with SG on various intrinsic and extrinsic tasks, while being many times faster to train.
Anthology ID:
2021.insights-1.1
Volume:
Proceedings of the Second Workshop on Insights from Negative Results in NLP
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venues:
EMNLP | insights
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–8
Language:
URL:
https://aclanthology.org/2021.insights-1.1
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.insights-1.1.pdf
Code
 bloomberg/koan
Data
C4GLUEQNLI