2024
pdf
bib
abs
Intervention extraction in preclinical animal studies of Alzheimer’s Disease: Enhancing regex performance with language model-based filtering
Yiyuan Pu
|
Kaitlyn Hair
|
Daniel Beck
|
Mike Conway
|
Malcolm MacLeod
|
Karin Verspoor
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
We explore different information extraction tools for annotation of interventions to support automated systematic reviews of preclinical AD animal studies. We compare two PICO (Population, Intervention, Comparison, and Outcome) extraction tools and two prompting-based learning strategies based on Large Language Models (LLMs). Motivated by the high recall of a dictionary-based approach, we define a two-stage method, removing false positives obtained from regexes with a pre-trained LM. With ChatGPT-based filtering using three-shot prompting, our approach reduces almost two-thirds of False Positives compared to the dictionary approach alone, while outperforming knowledge-free instructional prompting.
pdf
bib
abs
Optimizing Multimodal Large Language Models for Detection of Alcohol Advertisements via Adaptive Prompting
Daniel Cabrera Lozoya
|
Jiahe Liu
|
Simon D’Alfonso
|
Mike Conway
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Adolescents exposed to advertisements promoting addictive substances exhibit a higher likelihood of subsequent substance use. The predominant source for youth exposure to such advertisements is through online content accessed via smartphones. Detecting these advertisements is crucial for establishing and maintaining a safer online environment for young people. In our study, we utilized Multimodal Large Language Models (MLLMs) to identify addictive substance advertisements in digital media. The performance of MLLMs depends on the quality of the prompt used to instruct the model. To optimize our prompts, an adaptive prompt engineering approach was implemented, leveraging a genetic algorithm to refine and enhance the prompts. To evaluate the model’s performance, we augmented the RICO dataset, consisting of Android user interface screenshots, by superimposing alcohol ads onto them. Our results indicate that the MLLM can detect advertisements promoting alcohol with a 0.94 accuracy and a 0.94 F1 score.
pdf
bib
abs
MLBMIKABR at “Discharge Me!”: Concept Based Clinical Text Description Generation
Abir Naskar
|
Jane Hocking
|
Patty Chondros
|
Douglas Boyle
|
Mike Conway
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
This paper presents a method called Concept Based Description Generation, aimed at creating summaries (Brief Hospital Course and Discharge Instructions) using source (Discharge and Radiology) texts. We propose a rule-based approach for segmenting both the source and target texts. In the target text, we not only segment the content but also identify the concept of each segment based on text patterns. Our methodology involves creating a combined summarized version of each text segment, extracting important information, and then fine-tuning a Large Language Model (LLM) to generate aspects. Subsequently, we fine-tune a new LLM using a specific aspect, the combined summary, and a list of all aspects to generate detailed descriptions for each task. This approach integrates segmentation, concept identification, summarization, and language modeling to achieve accurate and informative descriptions for medical documentation tasks. Due to lack to time, We could only train on 10000 training data.
pdf
bib
abs
Generating Mental Health Transcripts with SAPE (Spanish Adaptive Prompt Engineering)
Daniel Lozoya
|
Alejandro Berazaluce
|
Juan Perches
|
Eloy Lúa
|
Mike Conway
|
Simon D’Alfonso
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Large language models have become valuable tools for data augmentation in scenarios with limited data availability, as they can generate synthetic data resembling real-world data. However, their generative performance depends on the quality of the prompt used to instruct the model. Prompt engineering that relies on hand-crafted strategies or requires domain experts to adjust the prompt often yields suboptimal results. In this paper we present SAPE, a Spanish Adaptive Prompt Engineering method utilizing genetic algorithms for prompt generation and selection. Our evaluation of SAPE focuses on a generative task that involves the creation of Spanish therapy transcripts, a type of data that is challenging to collect due to the fact that it typically includes protected health information. Through human evaluations conducted by mental health professionals, our results show that SAPE produces Spanish counselling transcripts that more closely resemble authentic therapy transcripts compared to other prompt engineering techniques that are based on Reflexion and Chain-of-Thought.
2023
pdf
bib
abs
Natural Language Processing for Clinical Text
Vlada Rozova
|
Jinghui Liu
|
Mike Conway
Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association
Learning from real-world clinical data has potential to promote the quality of care, improve the efficiency of healthcare systems, and support clinical research. As a large proportion of clinical information is recorded only in unstructured free-text format, applying NLP to process and understand the vast amount of clinical text generated in clinical encounters is essential. However, clinical text is known to be highly ambiguous, it contains complex professional terms requiring clinical expertise to understand and annotate, and it is written in different clinical contexts with distinct purposes. All these factors together make clinical NLP research both rewarding and challenging. In this tutorial, we will discuss the characteristics of clinical text and provide an overview of some of the tools and methods used to process it. We will also present a real-world example to show the effectiveness of different NLP methods in processing and understanding clinical text. Finally, we will discuss the strengths and limitations of large language models and their applications, evaluations, and extensions in clinical NLP.
2017
pdf
bib
abs
Investigating the Documentation of Electronic Cigarette Use in the Veteran Affairs Electronic Health Record: A Pilot Study
Danielle Mowery
|
Brett South
|
Olga Patterson
|
Shu-Hong Zhu
|
Mike Conway
BioNLP 2017
In this paper, we present pilot work on characterising the documentation of electronic cigarettes (e-cigarettes) in the United States Veterans Administration Electronic Health Record. The Veterans Health Administration is the largest health care system in the United States with 1,233 health care facilities nationwide, serving 8.9 million veterans per year. We identified a random sample of 2000 Veterans Administration patients, coded as current tobacco users, from 2008 to 2014. Using simple keyword matching techniques combined with qualitative analysis, we investigated the prevalence and distribution of e-cigarette terms in these clinical notes, discovering that for current smokers, 11.9% of patient records contain an e-cigarette related term.
pdf
bib
abs
A Corpus Analysis of Social Connections and Social Isolation in Adolescents Suffering from Depressive Disorders
Jia-Wen Guo
|
Danielle L Mowery
|
Djin Lai
|
Katherine Sward
|
Mike Conway
Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology — From Linguistic Signal to Clinical Reality
Social connection and social isolation are associated with depressive symptoms, particularly in adolescents and young adults, but how these concepts are documented in clinical notes is unknown. This pilot study aimed to identify the topics relevant to social connection and isolation by analyzing 145 clinical notes from patients with depression diagnosis. We found that providers, including physicians, nurses, social workers, and psychologists, document descriptions of both social connection and social isolation.
pdf
bib
abs
Investigating Patient Attitudes Towards the use of Social Media Data to Augment Depression Diagnosis and Treatment: a Qualitative Study
Jude Mikal
|
Samantha Hurst
|
Mike Conway
Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology — From Linguistic Signal to Clinical Reality
In this paper, we use qualitative research methods to investigate the attitudes of social media users towards the (opt-in) integration of social media data with routine mental health care and diagnosis. Our investigation was based on secondary analysis of a series of five focus groups with Twitter users, including three groups consisting of participants with a self-reported history of depression, and two groups consisting of participants without a self reported history of depression. Our results indicate that, overall, research participants were enthusiastic about the possibility of using social media (in conjunction with automated Natural Language Processing algorithms) for mood tracking under the supervision of a mental health practitioner. However, for at least some participants, there was skepticism related to how well social media represents the mental health of users, and hence its usefulness in the clinical context.
2016
pdf
bib
Vocabulary Development To Support Information Extraction of Substance Abuse from Psychiatry Notes
Sumithra Velupillai
|
Danielle L. Mowery
|
Mike Conway
|
John Hurdle
|
Brent Kious
Proceedings of the 15th Workshop on Biomedical Natural Language Processing
pdf
bib
abs
Towards Automatically Classifying Depressive Symptoms from Twitter Data for Population Health
Danielle L. Mowery
|
Albert Park
|
Craig Bryan
|
Mike Conway
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)
Major depressive disorder, a debilitating and burdensome disease experienced by individuals worldwide, can be defined by several depressive symptoms (e.g., anhedonia (inability to feel pleasure), depressed mood, difficulty concentrating, etc.). Individuals often discuss their experiences with depression symptoms on public social media platforms like Twitter, providing a potentially useful data source for monitoring population-level mental health risk factors. In a step towards developing an automated method to estimate the prevalence of symptoms associated with major depressive disorder over time in the United States using Twitter, we developed classifiers for discerning whether a Twitter tweet represents no evidence of depression or evidence of depression. If there was evidence of depression, we then classified whether the tweet contained a depressive symptom and if so, which of three subtypes: depressed mood, disturbed sleep, or fatigue or loss of energy. We observed that the most accurate classifiers could predict classes with high-to-moderate F1-score performances for no evidence of depression (85), evidence of depression (52), and depressive symptoms (49). We report moderate F1-scores for depressive symptoms ranging from 75 (fatigue or loss of energy) to 43 (disturbed sleep) to 35 (depressed mood). Our work demonstrates baseline approaches for automatically encoding Twitter data with granular depressive symptoms associated with major depressive disorder.
2015
pdf
bib
Towards Developing an Annotation Scheme for Depressive Disorder Symptoms: A Preliminary Study using Twitter Data
Danielle Mowery
|
Craig Bryan
|
Mike Conway
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality
2013
pdf
bib
Corpus-Driven Terminology Development: Populating Swedish SNOMED CT with Synonyms Extracted from Electronic Health Records
Aron Henriksson
|
Maria Skeppstedt
|
Maria Kvist
|
Martin Duneld
|
Mike Conway
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing
2010
pdf
bib
An ontology-driven system for detecting global health events
Nigel Collier
|
Reiko Matsuda Goodwin
|
John McCrae
|
Son Doan
|
Ai Kawazoe
|
Mike Conway
|
Asanee Kawtrakul
|
Koichi Takeuchi
|
Dinh Dien
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
pdf
bib
Developing a Biosurveillance Application Ontology for Influenza-Like-Illness
Mike Conway
|
John Dowling
|
Wendy Chapman
Proceedings of the 6th Workshop on Ontologies and Lexical Resources
2009
pdf
bib
Using Hedges to Enhance a Disease Outbreak Report Text Mining System
Mike Conway
|
Son Doan
|
Nigel Collier
Proceedings of the BioNLP 2009 Workshop