%0 Conference Proceedings %T Unsupervised Contextualized Document Representation %A Gupta, Ankur %A Gupta, Vivek %Y Moosavi, Nafise Sadat %Y Gurevych, Iryna %Y Fan, Angela %Y Wolf, Thomas %Y Hou, Yufang %Y Marasović, Ana %Y Ravi, Sujith %S Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing %D 2021 %8 November %I Association for Computational Linguistics %C Virtual %F gupta-gupta-2021-unsupervised %X Several NLP tasks need the effective repre-sentation of text documents.Arora et al.,2017 demonstrate that simple weighted aver-aging of word vectors frequently outperformsneural models. SCDV (Mekala et al., 2017)further extends this from sentences to docu-ments by employing soft and sparse cluster-ing over pre-computed word vectors. How-ever, both techniques ignore the polysemyand contextual character of words. In thispaper, we address this issue by proposingSCDV+BERT(ctxd), a simple and effective un-supervised representation that combines con-textualized BERT (Devlin et al., 2019) basedword embedding for word sense disambigua-tion with SCDV soft clustering approach. Weshow that our embeddings outperform origi-nal SCDV, pre-train BERT, and several otherbaselines on many classification datasets. Wealso demonstrate our embeddings effective-ness on other tasks, such as concept match-ing and sentence similarity. In addition,we show that SCDV+BERT(ctxd) outperformsfine-tune BERT and different embedding ap-proaches in scenarios with limited data andonly few shots examples. %R 10.18653/v1/2021.sustainlp-1.17 %U https://aclanthology.org/2021.sustainlp-1.17 %U https://doi.org/10.18653/v1/2021.sustainlp-1.17 %P 166-173