Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA

Ikram Belmadani; Oumaima El Khettari; Carlos Ramisch; Frederic Bechet; Richard Dufour; Benoit Favre

Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA

Ikram Belmadani, Oumaima El Khettari, Carlos Ramisch, Frederic Bechet, Richard Dufour, Benoit Favre

Abstract

The development of large language models (LLMs) has led to increased focus on their adaptation to specialized domains and languages, yet the effectiveness of domain adaptation strategies remains unclear. We present a study of medical domain adaptation using French medical question answering (QA) as a case study. We compare continual pretraining (CPT), supervised fine-tuning (SFT), and their combination across three model families, multiple sizes, and three initialization types, explicitly disentangling adaptation effects from base model choice. We evaluate both multiple-choice (MCQA) and open-ended QA (OEQA) under greedy and constrained decoding using automatic metrics and LLM-as-a-Judge evaluation. For MCQA, CPT+SFT most often achieves the best scores, but gains over SFT are small and frequently not statistically significant, making SFT a strong and cost-effective default. For OEQA, CPT consistently improves overlap-based metrics, while SFT often degrades generation quality; instruction tuning and CPT+SFT are preferred by LLM-based evaluation. Cross-lingual experiments further show effective transfer from French adaptation to English benchmarks. Overall, we provide practical guidelines for selecting adaptation strategies under computational constraints.

Anthology ID:: 2026.bionlp-1.19
Volume:: BioNLP 2026
Month:: July
Year:: 2026
Address:: San Diego, California
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:: BioNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 209–234
Language:
URL:: https://aclanthology.org/2026.bionlp-1.19/
DOI:
Bibkey:
Cite (ACL):: Ikram Belmadani, Oumaima El Khettari, Carlos Ramisch, Frederic Bechet, Richard Dufour, and Benoit Favre. 2026. Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA. In BioNLP 2026, pages 209–234, San Diego, California. Association for Computational Linguistics.
Cite (Informal):: Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA (Belmadani et al., BioNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.bionlp-1.19.pdf

PDF Cite Search Fix data