%0 Conference Proceedings %T EntityBERT: Entity-centric Masking Strategy for Model Pretraining for the Clinical Domain %A Lin, Chen %A Miller, Timothy %A Dligach, Dmitriy %A Bethard, Steven %A Savova, Guergana %Y Demner-Fushman, Dina %Y Cohen, Kevin Bretonnel %Y Ananiadou, Sophia %Y Tsujii, Junichi %S Proceedings of the 20th Workshop on Biomedical Language Processing %D 2021 %8 June %I Association for Computational Linguistics %C Online %F lin-etal-2021-entitybert %X Transformer-based neural language models have led to breakthroughs for a variety of natural language processing (NLP) tasks. However, most models are pretrained on general domain data. We propose a methodology to produce a model focused on the clinical domain: continued pretraining of a model with a broad representation of biomedical terminology (PubMedBERT) on a clinical corpus along with a novel entity-centric masking strategy to infuse domain knowledge in the learning process. We show that such a model achieves superior results on clinical extraction tasks by comparing our entity-centric masking strategy with classic random masking on three clinical NLP tasks: cross-domain negation detection, document time relation (DocTimeRel) classification, and temporal relation extraction. We also evaluate our models on the PubMedQA dataset to measure the models’ performance on a non-entity-centric task in the biomedical domain. The language addressed in this work is English. %R 10.18653/v1/2021.bionlp-1.21 %U https://aclanthology.org/2021.bionlp-1.21 %U https://doi.org/10.18653/v1/2021.bionlp-1.21 %P 191-201