KU_ai at MEDIQA 2019: Domain-specific Pre-training and Transfer Learning for Medical NLI

Cemil Cengiz, Ulaş Sert, Deniz Yuret


Abstract
In this paper, we describe our system and results submitted for the Natural Language Inference (NLI) track of the MEDIQA 2019 Shared Task. As KU_ai team, we used BERT as our baseline model and pre-processed the MedNLI dataset to mitigate the negative impact of de-identification artifacts. Moreover, we investigated different pre-training and transfer learning approaches to improve the performance. We show that pre-training the language model on rich biomedical corpora has a significant effect in teaching the model domain-specific language. In addition, training the model on large NLI datasets such as MultiNLI and SNLI helps in learning task-specific reasoning. Finally, we ensembled our highest-performing models, and achieved 84.7% accuracy on the unseen test dataset and ranked 10th out of 17 teams in the official results.
Anthology ID:
W19-5045
Volume:
Proceedings of the 18th BioNLP Workshop and Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
427–436
Language:
URL:
https://aclanthology.org/W19-5045
DOI:
10.18653/v1/W19-5045
Bibkey:
Cite (ACL):
Cemil Cengiz, Ulaş Sert, and Deniz Yuret. 2019. KU_ai at MEDIQA 2019: Domain-specific Pre-training and Transfer Learning for Medical NLI. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 427–436, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
KU_ai at MEDIQA 2019: Domain-specific Pre-training and Transfer Learning for Medical NLI (Cengiz et al., BioNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5045.pdf
Data
MultiNLISNLI