Robert Bevan
2023
MDC at BioLaySumm Task 1: Evaluating GPT Models for Biomedical Lay Summarization
Oisín Turbitt
|
Robert Bevan
|
Mouhamad Aboshokor
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks
This paper presents our approach to the BioLaySumm Task 1 shared task, held at the BioNLP 2023 Workshop. The effective communication of scientific knowledge to the general public is often limited by the technical language used in research, making it difficult for non-experts to comprehend. To address this issue, lay summaries can be used to explain research findings to non-experts in an accessible form. We conduct an evaluation of autoregressive language models, both general and specialized for the biomedical domain, to generate lay summaries from biomedical research article abstracts. Our findings demonstrate that a GPT-3.5 model combined with a straightforward few-shot prompt produces lay summaries that achieve significantly relevance and factuality compared to those generated by a fine-tuned BioGPT model. However, the summaries generated by the BioGPT model exhibit better readability. Notably, our submission for the shared task achieved 1st place in the competition.
MDC at SemEval-2023 Task 7: Fine-tuning Transformers for Textual Entailment Prediction and Evidence Retrieval in Clinical Trials
Robert Bevan
|
Oisín Turbitt
|
Mouhamad Aboshokor
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
We present our entry to the Multi-evidence Natural Language Inference for Clinical Trial Datatask at SemEval 2023. We submitted entries forboth the evidence retrieval and textual entailment sub-tasks. For the evidence retrieval task,we fine-tuned the PubMedBERT transformermodel to extract relevant evidence from clinicaltrial data given a hypothesis concerning either asingle clinical trial or pair of clinical trials. Ourbest performing model achieved an F1 scoreof 0.804. For the textual entailment task, inwhich systems had to predict whether a hypothesis about either a single clinical trial or pair ofclinical trials is true or false, we fine-tuned theBioLinkBERT transformer model. We passedour evidence retrieval model’s output into ourtextual entailment model and submitted its output for the evaluation. Our best performingmodel achieved an F1 score of 0.695.