Corrected CBOW Performs as well as Skip-gram

Ozan İrsoy, Adrian Benton, Karl Stratos


Abstract
Mikolov et al. (2013a) observed that continuous bag-of-words (CBOW) word embeddings tend to underperform Skip-gram (SG) embeddings, and this finding has been reported in subsequent works. We find that these observations are driven not by fundamental differences in their training objectives, but more likely on faulty negative sampling CBOW implementations in popular libraries such as the official implementation, word2vec.c, and Gensim. We show that after correcting a bug in the CBOW gradient update, one can learn CBOW word embeddings that are fully competitive with SG on various intrinsic and extrinsic tasks, while being many times faster to train.
Anthology ID:
2021.insights-1.1
Volume:
Proceedings of the Second Workshop on Insights from Negative Results in NLP
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
João Sedoc, Anna Rogers, Anna Rumshisky, Shabnam Tafreshi
Venue:
insights
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–8
Language:
URL:
https://aclanthology.org/2021.insights-1.1
DOI:
10.18653/v1/2021.insights-1.1
Bibkey:
Cite (ACL):
Ozan İrsoy, Adrian Benton, and Karl Stratos. 2021. Corrected CBOW Performs as well as Skip-gram. In Proceedings of the Second Workshop on Insights from Negative Results in NLP, pages 1–8, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Corrected CBOW Performs as well as Skip-gram (İrsoy et al., insights 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.insights-1.1.pdf
Video:
 https://aclanthology.org/2021.insights-1.1.mp4
Code
 bloomberg/koan
Data
C4GLUEQNLI