A Domain Knowledge Enhanced Pre-Trained Language Model for Vertical Search: Case Study on Medicinal Products

Kesong Liu, Jianhui Jiang, Feifei Lyu


Abstract
We present a biomedical knowledge enhanced pre-trained language model for medicinal product vertical search. Following ELECTRA’s replaced token detection (RTD) pre-training, we leverage biomedical entity masking (EM) strategy to learn better contextual word representations. Furthermore, we propose a novel pre-training task, product attribute prediction (PAP), to inject product knowledge into the pre-trained language model efficiently by leveraging medicinal product databases directly. By sharing the parameters of PAP’s transformer encoder with that of RTD’s main transformer, these two pre-training tasks are jointly learned. Experiments demonstrate the effectiveness of PAP task for pre-trained language model on medicinal product vertical search scenario, which includes query-title relevance, query intent classification, and named entity recognition in query.
Anthology ID:
2022.coling-1.85
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
1014–1023
Language:
URL:
https://aclanthology.org/2022.coling-1.85
DOI:
Bibkey:
Cite (ACL):
Kesong Liu, Jianhui Jiang, and Feifei Lyu. 2022. A Domain Knowledge Enhanced Pre-Trained Language Model for Vertical Search: Case Study on Medicinal Products. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1014–1023, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
A Domain Knowledge Enhanced Pre-Trained Language Model for Vertical Search: Case Study on Medicinal Products (Liu et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.85.pdf
Code
 liuks/ep_plm