Pre-trained language models in Spanish for health insurance coverage

Claudio Aracena; Nicolás Rodríguez; Victor Rocco; Jocelyn Dunstan

doi:10.18653/v1/2023.clinicalnlp-1.46

Pre-trained language models in Spanish for health insurance coverage

Claudio Aracena, Nicolás Rodríguez, Victor Rocco, Jocelyn Dunstan

Abstract

The field of clinical natural language processing (NLP) can extract useful information from clinical text. Since 2017, the NLP field has shifted towards using pre-trained language models (PLMs), improving performance in several tasks. Most of the research in this field has focused on English text, but there are some available PLMs in Spanish. In this work, we use clinical PLMs to analyze text from admission and medical reports in Spanish for an insurance and health provider to give a probability of no coverage in a labor insurance process. Our results show that fine-tuning a PLM pre-trained with the provider’s data leads to better results, but this process is time-consuming and computationally expensive. At least for this task, fine-tuning publicly available clinical PLM leads to comparable results to a custom PLM, but in less time and with fewer resources. Analyzing large volumes of insurance requests is burdensome for employers, and models can ease this task by pre-classifying reports that are likely not to have coverage. Our approach of entirely using clinical-related text improves the current models while reinforcing the idea of clinical support systems that simplify human labor but do not replace it. To our knowledge, the clinical corpus collected for this study is the largest one reported for the Spanish language.

Anthology ID:: 2023.clinicalnlp-1.46
Volume:: Proceedings of the 5th Clinical Natural Language Processing Workshop
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Anna Rumshisky
Venue:: ClinicalNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 433–438
Language:
URL:: https://aclanthology.org/2023.clinicalnlp-1.46
DOI:: 10.18653/v1/2023.clinicalnlp-1.46
Bibkey:
Cite (ACL):: Claudio Aracena, Nicolás Rodríguez, Victor Rocco, and Jocelyn Dunstan. 2023. Pre-trained language models in Spanish for health insurance coverage. In Proceedings of the 5th Clinical Natural Language Processing Workshop, pages 433–438, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Pre-trained language models in Spanish for health insurance coverage (Aracena et al., ClinicalNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.clinicalnlp-1.46.pdf

PDF Cite Search