Boosting Low-Resource Biomedical QA via Entity-Aware Masking Strategies

Gabriele Pergola, Elena Kochkina, Lin Gui, Maria Liakata, Yulan He


Abstract
Biomedical question-answering (QA) has gained increased attention for its capability to provide users with high-quality information from a vast scientific literature. Although an increasing number of biomedical QA datasets has been recently made available, those resources are still rather limited and expensive to produce; thus, transfer learning via pre-trained language models (LMs) has been shown as a promising approach to leverage existing general-purpose knowledge. However, fine-tuning these large models can be costly and time consuming and often yields limited benefits when adapting to specific themes of specialised domains, such as the COVID-19 literature. Therefore, to bootstrap further their domain adaptation, we propose a simple yet unexplored approach, which we call biomedical entity-aware masking (BEM) strategy, encouraging masked language models to learn entity-centric knowledge based on the pivotal entities characterizing the domain at hand, and employ those entities to drive the LM fine-tuning. The resulting strategy is a downstream process applicable to a wide variety of masked LMs, not requiring additional memory or components in the neural architectures. Experimental results show performance on par with the state-of-the-art models on several biomedical QA datasets.
Anthology ID:
2021.eacl-main.169
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Editors:
Paola Merlo, Jorg Tiedemann, Reut Tsarfaty
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1977–1985
Language:
URL:
https://aclanthology.org/2021.eacl-main.169
DOI:
10.18653/v1/2021.eacl-main.169
Bibkey:
Cite (ACL):
Gabriele Pergola, Elena Kochkina, Lin Gui, Maria Liakata, and Yulan He. 2021. Boosting Low-Resource Biomedical QA via Entity-Aware Masking Strategies. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1977–1985, Online. Association for Computational Linguistics.
Cite (Informal):
Boosting Low-Resource Biomedical QA via Entity-Aware Masking Strategies (Pergola et al., EACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eacl-main.169.pdf
Data
CovidQAMS MARCOSQuADSemantic Scholar