Angus Roberts


2024

pdf bib
Generation and Evaluation of Synthetic Endoscopy Free-Text Reports with Differential Privacy
Agathe Zecevic | Xinyue Zhang | Sebastian Zeki | Angus Roberts
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

The development of NLP models in the healthcare sector faces important challenges due to the limited availability of patient data, mainly driven by privacy concerns. This study proposes the generation of synthetic free-text medical reports, specifically focusing on the gastroenterology domain, to address the scarcity of specialised datasets, while preserving patient privacy. We fine-tune BioGPT on over 90 000 endoscopy reports and integrate Differential Privacy (DP) into the training process. 10 000 DP-private synthetic reports are generated by this model. The generated synthetic data is evaluated through multiple dimensions: similarity to real datasets, language quality, and utility in both supervised and semi-supervised NLP tasks. Results suggest that while DP integration impacts text quality, it offers a promising balance between data utility and privacy, improving the performance of a real-world downstream task. Our study underscores the potential of synthetic data to facilitate model development in the healthcare domain without compromising patient privacy.

2020

pdf bib
Using Deep Neural Networks with Intra- and Inter-Sentence Context to Classify Suicidal Behaviour
Xingyi Song | Johnny Downs | Sumithra Velupillai | Rachel Holden | Maxim Kikoler | Kalina Bontcheva | Rina Dutta | Angus Roberts
Proceedings of the Twelfth Language Resources and Evaluation Conference

Identifying statements related to suicidal behaviour in psychiatric electronic health records (EHRs) is an important step when modeling that behaviour, and when assessing suicide risk. We apply a deep neural network based classification model with a lightweight context encoder, to classify sentence level suicidal behaviour in EHRs. We show that incorporating information from sentences to left and right of the target sentence significantly improves classification accuracy. Our approach achieved the best performance when classifying suicidal behaviour in Autism Spectrum Disorder patient records. The results could have implications for suicidality research and clinical surveillance.

pdf bib
Development of a Corpus Annotated with Medications and their Attributes in Psychiatric Health Records
Jaya Chaturvedi | Natalia Viani | Jyoti Sanyal | Chloe Tytherleigh | Idil Hasan | Kate Baird | Sumithra Velupillai | Robert Stewart | Angus Roberts
Proceedings of the Twelfth Language Resources and Evaluation Conference

Free text fields within electronic health records (EHRs) contain valuable clinical information which is often missed when conducting research using EHR databases. One such type of information is medications which are not always available in structured fields, especially in mental health records. Most use cases that require medication information also generally require the associated temporal information (e.g. current or past) and attributes (e.g. dose, route, frequency). The purpose of this study is to develop a corpus of medication annotations in mental health records. The aim is to provide a more complete picture behind the mention of medications in the health records, by including additional contextual information around them, and to create a resource for use when developing and evaluating applications for the extraction of medications from EHR text. Thus far, an analysis of temporal information related to medications mentioned in a sample of mental health records has been conducted. The purpose of this analysis was to understand the complexity of medication mentions and their associated temporal information in the free text of EHRs, with a specific focus on the mental health domain.

pdf bib
Comparative Analysis of Text Classification Approaches in Electronic Health Records
Aurelie Mascio | Zeljko Kraljevic | Daniel Bean | Richard Dobson | Robert Stewart | Rebecca Bendayan | Angus Roberts
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

Text classification tasks which aim at harvesting and/or organizing information from electronic health records are pivotal to support clinical and translational research. However these present specific challenges compared to other classification tasks, notably due to the particular nature of the medical lexicon and language used in clinical records. Recent advances in embedding methods have shown promising results for several clinical tasks, yet there is no exhaustive comparison of such approaches with other commonly used word representations and classification models. In this work, we analyse the impact of various word representations, text pre-processing and classification algorithms on the performance of four different text classification tasks. The results show that traditional approaches, when tailored to the specific language and structure of the text inherent to the classification task, can achieve or exceed the performance of more recent ones based on contextual embeddings such as BERT.

2018

pdf bib
A Deep Neural Network Sentence Level Classification Method with Context Information
Xingyi Song | Johann Petrak | Angus Roberts
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In the sentence classification task, context formed from sentences adjacent to the sentence being classified can provide important information for classification. This context is, however, often ignored. Where methods do make use of context, only small amounts are considered, making it difficult to scale. We present a new method for sentence classification, Context-LSTM-CNN, that makes use of potentially large contexts. The method also utilizes long-range dependencies within the sentence being classified, using an LSTM, and short-span features, using a stacked CNN. Our experiments demonstrate that this approach consistently improves over previous methods on two different datasets.

2016

pdf bib
Identifying First Episodes of Psychosis in Psychiatric Patient Records using Machine Learning
Genevieve Gorrell | Sherifat Oduola | Angus Roberts | Tom Craig | Craig Morgan | Rob Stewart
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

2015

pdf bib
UFPRSheffield: Contrasting Rule-based and Support Vector Machine Approaches to Time Expression Identification in Clinical TempEval
Hegler Tissot | Genevieve Gorrell | Angus Roberts | Leon Derczynski | Marcos Didonet Del Fabro
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Analysis of Temporal Expressions Annotated in Clinical Notes
Hegler Tissot | Angus Roberts | Leon Derczynski | Genevieve Gorrell | Marcus Didonet Del Fabro
Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11)

2013

pdf bib
Finding Negative Symptoms of Schizophrenia in Patient Records
Genevieve Gorrell | Angus Roberts | Richard Jackson | Robert Stewart
Proceedings of the Workshop on NLP for Medicine and Biology associated with RANLP 2013

2009

pdf bib
Tunable Domain-Independent Event Extraction in the MIRA Framework
Georgi Georgiev | Kuzman Ganchev | Vassil Momchev | Deyan Peychev | Preslav Nakov | Angus Roberts
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

2008

pdf bib
Extracting Clinical Relationships from Patient Narratives
Angus Roberts | Robert Gaizauskas | Mark Hepple
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

pdf bib
Combining Terminology Resources and Statistical Methods for Entity Recognition: an Evaluation
Angus Roberts | Robert Gaizasukas | Mark Hepple | Yikun Guo
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Terminologies and other knowledge resources are widely used to aid entity recognition in specialist domain texts. As well as providing lexicons of specialist terms, linkage from the text back to a resource can make additional knowledge available to applications. Use of such resources is especially pertinent in the biomedical domain, where large numbers of these resources are available, and where they are widely used in informatics applications. Terminology resources can be most readily used by simple lexical lookup of terms in the text. A major drawback with such lexical lookup, however, is poor precision caused by ambiguity between domain terms and general language words. We combine lexical lookup with simple filtering of ambiguous terms, to improve precision. We compare this lexical lookup with a statistical method of entity recognition, and to a method which combines the two approaches. We show that the combined method boosts precision with little loss of recall, and that linkage from recognised entities back to the domain knowledge resources can be maintained.

pdf bib
ANNALIST - ANNotation ALIgnment and Scoring Tool
George Demetriou | Robert Gaizauskas | Haotian Sun | Angus Roberts
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we describe ANNALIST (Annotation, Alignment and Scoring Tool), a scoring system for the evaluation of the output of semantic annotation systems. ANNALIST has been designed as a system that is easily extensible and configurable for different domains, data formats, and evaluation tasks. The system architecture enables data input via the use of plugins and the users can access the system’s internal alignment and scoring mechanisms without the need to convert their data to a specified format. Although developed for evaluation tasks that involve the scoring of entity mentions and relations primarily, ANNALIST’s generic object representation and the availability of a range of criteria for the comparison of annotations enable the system to be tailored to a variety of scoring jobs. The paper reports on results from using ANNALIST in real-world situations in comparison to other scorers which are more established in the literature. ANNALIST has been used extensively for evaluation tasks within the VIKEF (EU FP6) and CLEF (UK MRC) projects.

2005

pdf bib
Learning Meronyms from Biomedical Text
Angus Roberts
Proceedings of the ACL Student Research Workshop

2004

pdf bib
A Large Scale Terminology Resource for Biomedical Text Processing
Henk Harkema | Robert Gaizauskas | Mark Hepple | Angus Roberts | Ian Roberts | Neil Davis | Yikun Guo
HLT-NAACL 2004 Workshop: Linking Biological Literature, Ontologies and Databases

pdf bib
A Large-Scale Resource for Storing and Recognizing Technical Terminology
Henk Harkema | Robert Gaizauskas | Mark Hepple | Neil Davis | Yikun Guo | Angus Roberts | Ian Roberts
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)