Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media Xiang Dai author Sarvnaz Karimi author Ben Hachey author Cecile Paris author 2020-11 text Findings of the Association for Computational Linguistics: EMNLP 2020 Trevor Cohn editor Yulan He editor Yang Liu editor Association for Computational Linguistics Online conference publication dai-etal-2020-cost 10.18653/v1/2020.findings-emnlp.151 https://aclanthology.org/2020.findings-emnlp.151/ 2020-11 1675 1681