Multilevel Analysis of Biomedical Domain Adaptation of Llama 2: What Matters the Most? A Case Study

Vicente Ivan Sanchez Carmona; Shanshan Jiang; Takeshi Suzuki; Bin Dong

doi:10.18653/v1/2024.bionlp-1.36

Multilevel Analysis of Biomedical Domain Adaptation of Llama 2: What Matters the Most? A Case Study

Vicente Ivan Sanchez Carmona, Shanshan Jiang, Takeshi Suzuki, Bin Dong

Abstract

Domain adaptation of Large Language Models (LLMs) leads to models better suited for a particular domain by capturing patterns from domain text which leads to improvements in downstream tasks. To the naked eye, these improvements are visible; however, the patterns are not so. How can we know which patterns and how much they contribute to changes in downstream scores? Through a Multilevel Analysis we discover and quantify the effect of text patterns on downstream scores of domain-adapted Llama 2 for the task of sentence similarity (BIOSSES dataset). We show that text patterns from PubMed abstracts such as clear writing and simplicity, as well as the amount of biomedical information, are the key for improving downstream scores. Also, we show how another factor not usually quantified contributes equally to downstream scores: choice of hyperparameters for both domain adaptation and fine-tuning.

Anthology ID:: 2024.bionlp-1.36
Volume:: Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Kirk Roberts, Junichi Tsujii
Venues:: BioNLP | WS
SIG:: SIGBIOMED
Publisher:: Association for Computational Linguistics
Note:
Pages:: 449–456
Language:
URL:: https://aclanthology.org/2024.bionlp-1.36
DOI:: 10.18653/v1/2024.bionlp-1.36
Bibkey:
Cite (ACL):: Vicente Ivan Sanchez Carmona, Shanshan Jiang, Takeshi Suzuki, and Bin Dong. 2024. Multilevel Analysis of Biomedical Domain Adaptation of Llama 2: What Matters the Most? A Case Study. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, pages 449–456, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Multilevel Analysis of Biomedical Domain Adaptation of Llama 2: What Matters the Most? A Case Study (Sanchez Carmona et al., BioNLP-WS 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.bionlp-1.36.pdf
Optional supplementary material:: 2024.bionlp-1.36.OptionalSupplementaryMaterial.zip

PDF Cite Search Optional supplementary material