Jonathan Heras


2024

pdf bib
Replace, Paraphrase or Fine-tune? Evaluating Automatic Simplification for Medical Texts in Spanish
Leonardo Campillos-Llanos | Ana Rosa Terroba | Rocío Bartolomé | Ana Valverde-Mateos | Cristina González | Adrián Capllonch-Carrión | Jonathan Heras
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Patients can not always completely understand medical documents given the myriad of technical terms they contain. Automatic text simplification techniques can help, but they must guarantee that the content is transmitted rigorously and not creating wrong information. In this work, we tested: 1) lexicon-based simplification approaches, using a Spanish lexicon of technical and laymen terms collected for this task (SimpMedLexSp); 2) deep-learning (DL) based methods, with BART-based and prompt-learning-based models; and 3) a combination of both techniques. As a test set, we used 5000 parallel (technical and laymen) sentence pairs: 3800 manually aligned sentences from the CLARA-MeD corpus; and 1200 sentences from clinical trials simplified by linguists. We conducted a quantitative evaluation with standard measures (BLEU, ROUGE and SARI) and a human evaluation, in which eleven subjects scored the simplification output of several methods. In our experiments, the lexicon improved the quantitative results when combined with the DL models. The simplified sentences using only the lexicon were assessed with the highest scores regarding semantic adequacy; however, their fluency needs to be improved. The prompt-method had similar ratings in this aspect and in simplification. We make available the models and the data to reproduce our results.