An evaluation of current benchmarking strategies for French biomedical language models

Felix Herron

An evaluation of current benchmarking strategies for French biomedical language models

Abstract

We describe the current state of benchmarking for French language biomedical natural language processing (NLP). We note two important criteria in biomedical benchmarking: first, that a biomedical benchmark clearly simulate a specific use cases, in order to offer a useful evaluation of a biomedical model’s real life applicability. Second: that a biomedical benchmark be created in collaboration with biomedical professionals. We note that many biomedical benchmarks, particularly in French, do not adhere to these criteria; however, we highlight other biomedical benchmarks which adhere better to those criteria. Furthermore, we evaluate some of the most common French biomedical benchmarks on an array of models and empirically support the necessity of domain-specific and language-specific pre-training for natural language understanding (NLU) tasks. We show that some popular French biomedical language models perform poorly and/or inconsistently on important biomedical tasks. Finally, we advocate for an increase in publicly available, clinically targeted French biomedical NLU benchmarks.

Anthology ID:: 2024.jeptalnrecital-recital.1
Volume:: Actes de la 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues
Month:: 7
Year:: 2024
Address:: Toulouse, France
Editors:: Mathieu Balaguer, Nihed Bendahman, Lydia-Mai Ho-dac, Julie Mauclair, Jose G Moreno, Julien Pinquier
Venue:: JEP/TALN/RECITAL
SIG:
Publisher:: ATALA and AFPC
Note:
Pages:: 1–16
Language:
URL:: https://aclanthology.org/2024.jeptalnrecital-recital.1/
DOI:
Bibkey:
Cite (ACL):: Felix Herron. 2024. An evaluation of current benchmarking strategies for French biomedical language models. In Actes de la 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, pages 1–16, Toulouse, France. ATALA and AFPC.
Cite (Informal):: An evaluation of current benchmarking strategies for French biomedical language models (Herron, JEP/TALN/RECITAL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.jeptalnrecital-recital.1.pdf

PDF Cite Search Fix data