Sumithra Velupillai


2020

pdf bib
Relative and Incomplete Time Expression Anchoring for Clinical Text
Louise Dupuis | Nicol Bergou | Hegler Tissot | Sumithra Velupillai
Proceedings of the 3rd Clinical Natural Language Processing Workshop

Extracting and modeling temporal information in clinical text is an important element for developing timelines and disease trajectories. Time information in written text varies in preciseness and explicitness, posing challenges for NLP approaches that aim to accurately anchor temporal information on a timeline. Relative and incomplete time expressions (RI-Timexes) are expressions that require additional information for their temporal anchor to be resolved, but few studies have addressed this challenge specifically. In this study, we aimed to reproduce and verify a classification approach for identifying anchor dates and relations in clinical text, and propose a novel relation classification approach for this task.

pdf bib
Distinguishing between Dementia with Lewy bodies (DLB) and Alzheimer’s Disease (AD) using Mental Health Records: a Classification Approach
Zixu Wang | Julia Ive | Sinead Moylett | Christoph Mueller | Rudolf Cardinal | Sumithra Velupillai | John O’Brien | Robert Stewart
Proceedings of the 3rd Clinical Natural Language Processing Workshop

While Dementia with Lewy Bodies (DLB) is the second most common type of neurodegenerative dementia following Alzheimer’s Disease (AD), it is difficult to distinguish from AD. We propose a method for DLB detection by using mental health record (MHR) documents from a (3-month) period before a patient has been diagnosed with DLB or AD. Our objective is to develop a model that could be clinically useful to differentiate between DLB and AD across datasets from different healthcare institutions. We cast this as a classification task using Convolutional Neural Network (CNN), an efficient neural model for text classification. We experiment with different representation models, and explore the features that contribute to model performances. In addition, we apply temperature scaling, a simple but efficient model calibration method, to produce more reliable predictions. We believe the proposed method has important potential for clinical applications using routine healthcare records, and for generalising to other relevant clinical record datasets. To the best of our knowledge, this is the first attempt to distinguish DLB from AD using mental health records, and to improve the reliability of DLB predictions.

pdf bib
Using Deep Neural Networks with Intra- and Inter-Sentence Context to Classify Suicidal Behaviour
Xingyi Song | Johnny Downs | Sumithra Velupillai | Rachel Holden | Maxim Kikoler | Kalina Bontcheva | Rina Dutta | Angus Roberts
Proceedings of the Twelfth Language Resources and Evaluation Conference

Identifying statements related to suicidal behaviour in psychiatric electronic health records (EHRs) is an important step when modeling that behaviour, and when assessing suicide risk. We apply a deep neural network based classification model with a lightweight context encoder, to classify sentence level suicidal behaviour in EHRs. We show that incorporating information from sentences to left and right of the target sentence significantly improves classification accuracy. Our approach achieved the best performance when classifying suicidal behaviour in Autism Spectrum Disorder patient records. The results could have implications for suicidality research and clinical surveillance.

pdf bib
Development of a Corpus Annotated with Medications and their Attributes in Psychiatric Health Records
Jaya Chaturvedi | Natalia Viani | Jyoti Sanyal | Chloe Tytherleigh | Idil Hasan | Kate Baird | Sumithra Velupillai | Robert Stewart | Angus Roberts
Proceedings of the Twelfth Language Resources and Evaluation Conference

Free text fields within electronic health records (EHRs) contain valuable clinical information which is often missed when conducting research using EHR databases. One such type of information is medications which are not always available in structured fields, especially in mental health records. Most use cases that require medication information also generally require the associated temporal information (e.g. current or past) and attributes (e.g. dose, route, frequency). The purpose of this study is to develop a corpus of medication annotations in mental health records. The aim is to provide a more complete picture behind the mention of medications in the health records, by including additional contextual information around them, and to create a resource for use when developing and evaluating applications for the extraction of medications from EHR text. Thus far, an analysis of temporal information related to medications mentioned in a sample of mental health records has been conducted. The purpose of this analysis was to understand the complexity of medication mentions and their associated temporal information in the free text of EHRs, with a specific focus on the mental health domain.

pdf bib
Exploring Transformer Text Generation for Medical Dataset Augmentation
Ali Amin-Nejad | Julia Ive | Sumithra Velupillai
Proceedings of the Twelfth Language Resources and Evaluation Conference

Natural Language Processing (NLP) can help unlock the vast troves of unstructured data in clinical text and thus improve healthcare research. However, a big barrier to developments in this field is data access due to patient confidentiality which prohibits the sharing of this data, resulting in small, fragmented and sequestered openly available datasets. Since NLP model development requires large quantities of data, we aim to help side-step this roadblock by exploring the usage of Natural Language Generation in augmenting datasets such that they can be used for NLP model development on downstream clinically relevant tasks. We propose a methodology guiding the generation with structured patient information in a sequence-to-sequence manner. We experiment with state-of-the-art Transformer models and demonstrate that our augmented dataset is capable of beating our baselines on a downstream classification task. Finally, we also create a user interface and release the scripts to train generation models to stimulate further research in this area.

2019

pdf bib
Annotating Temporal Information in Clinical Notes for Timeline Reconstruction: Towards the Definition of Calendar Expressions
Natalia Viani | Hegler Tissot | Ariane Bernardino | Sumithra Velupillai
Proceedings of the 18th BioNLP Workshop and Shared Task

To automatically analyse complex trajectory information enclosed in clinical text (e.g. timing of symptoms, duration of treatment), it is important to understand the related temporal aspects, anchoring each event on an absolute point in time. In the clinical domain, few temporally annotated corpora are currently available. Moreover, underlying annotation schemas - which mainly rely on the TimeML standard - are not necessarily easily applicable for applications such as patient timeline reconstruction. In this work, we investigated how temporal information is documented in clinical text by annotating a corpus of medical reports with time expressions (TIMEXes), based on TimeML. The developed corpus is available to the NLP community. Starting from our annotations, we analysed the suitability of the TimeML TIMEX schema for capturing timeline information, identifying challenges and possible solutions. As a result, we propose a novel annotation schema that could be useful for timeline reconstruction: CALendar EXpression (CALEX).

pdf bib
Is artificial data useful for biomedical Natural Language Processing algorithms?
Zixu Wang | Julia Ive | Sumithra Velupillai | Lucia Specia
Proceedings of the 18th BioNLP Workshop and Shared Task

A major obstacle to the development of Natural Language Processing (NLP) methods in the biomedical domain is data accessibility. This problem can be addressed by generating medical data artificially. Most previous studies have focused on the generation of short clinical text, and evaluation of the data utility has been limited. We propose a generic methodology to guide the generation of clinical text with key phrases. We use the artificial data as additional training data in two key biomedical NLP tasks: text classification and temporal relation extraction. We show that artificially generated training data used in conjunction with real training data can lead to performance boosts for data-greedy neural network algorithms. We also demonstrate the usefulness of the generated data for NLP setups where it fully replaces real training data.

2018

pdf bib
Hierarchical neural model with attention mechanisms for the classification of social media text related to mental health
Julia Ive | George Gkotsis | Rina Dutta | Robert Stewart | Sumithra Velupillai
Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic

Mental health problems represent a major public health challenge. Automated analysis of text related to mental health is aimed to help medical decision-making, public health policies and to improve health care. Such analysis may involve text classification. Traditionally, automated classification has been performed mainly using machine learning methods involving costly feature engineering. Recently, the performance of those methods has been dramatically improved by neural methods. However, mainly Convolutional neural networks (CNNs) have been explored. In this paper, we apply a hierarchical Recurrent neural network (RNN) architecture with an attention mechanism on social media data related to mental health. We show that this architecture improves overall classification results as compared to previously reported results on the same data. Benefitting from the attention mechanism, it can also efficiently select text elements crucial for classification decisions, which can also be used for in-depth analysis.

pdf bib
Time Expressions in Mental Health Records for Symptom Onset Extraction
Natalia Viani | Lucia Yin | Joyce Kam | Ayunni Alawi | André Bittar | Rina Dutta | Rashmi Patel | Robert Stewart | Sumithra Velupillai
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

For psychiatric disorders such as schizophrenia, longer durations of untreated psychosis are associated with worse intervention outcomes. Data included in electronic health records (EHRs) can be useful for retrospective clinical studies, but much of this is stored as unstructured text which cannot be directly used in computation. Natural Language Processing (NLP) methods can be used to extract this data, in order to identify symptoms and treatments from mental health records, and temporally anchor the first emergence of these. We are developing an EHR corpus annotated with time expressions, clinical entities and their relations, to be used for NLP development. In this study, we focus on the first step, identifying time expressions in EHRs for patients with schizophrenia. We developed a gold standard corpus, compared this corpus to other related corpora in terms of content and time expression prevalence, and adapted two NLP systems for extracting time expressions. To the best of our knowledge, this is the first resource annotated for temporal entities in the mental health domain.

2016

pdf bib
UtahBMI at SemEval-2016 Task 12: Extracting Temporal Information from Clinical Text
Abdulrahman Khalifa | Sumithra Velupillai | Stephane Meystre
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
The language of mental health problems in social media
George Gkotsis | Anika Oellrich | Tim Hubbard | Richard Dobson | Maria Liakata | Sumithra Velupillai | Rina Dutta
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf bib
Don’t Let Notes Be Misunderstood: A Negation Detection Method for Assessing Risk of Suicide in Mental Health Records
George Gkotsis | Sumithra Velupillai | Anika Oellrich | Harry Dean | Maria Liakata | Rina Dutta
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf bib
Vocabulary Development To Support Information Extraction of Substance Abuse from Psychiatry Notes
Sumithra Velupillai | Danielle L. Mowery | Mike Conway | John Hurdle | Brent Kious
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

2015

pdf bib
BluLab: Temporal Information Extraction for the 2015 Clinical TempEval Challenge
Sumithra Velupillai | Danielle L Mowery | Samir Abdelrahman | Lee Christensen | Wendy Chapman
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)
Sumithra Velupillai | Martin Duneld | Maria Kvist | Hercules Dalianis | Maria Skeppstedt | Aron Henriksson
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

pdf bib
Improving Readability of Swedish Electronic Health Records through Lexical Simplification: First Results
Gintarė Grigonyte | Maria Kvist | Sumithra Velupillai | Mats Wirén
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)

pdf bib
Generating Patient Problem Lists from the ShARe Corpus using SNOMED CT/SNOMED CT CORE Problem List
Danielle Mowery | Mindy Ross | Sumithra Velupillai | Stephane Meystre | Janyce Wiebe | Wendy Chapman
Proceedings of BioNLP 2014

pdf bib
Temporal Expressions in Swedish Medical Text – A Pilot Study
Sumithra Velupillai
Proceedings of BioNLP 2014

2013

pdf bib
Negation Scope Delimitation in Clinical Text Using Three Approaches: NegEx, PyConTextNLP and SynNeg
Hideyuki Tanushi | Hercules Dalianis | Martin Duneld | Maria Kvist | Maria Skeppstedt | Sumithra Velupillai
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

pdf bib
Medical diagnosis lost in translation – Analysis of uncertainty and negation expressions in English and Swedish clinical texts
Danielle L Mowery | Sumithra Velupillai | Wendy W Chapman
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

2011

pdf bib
Something Old, Something New – Applying a Pre-trained Parsing Model to Clinical Swedish
Martin Duneld | Aron Henriksson | Sumithra Velupillai
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

2010

pdf bib
Characteristics and Analysis of Finnish and Swedish Clinical Intensive Care Nursing Narratives
Helen Allvin | Elin Carlsson | Hercules Dalianis | Riitta Danielsson-Ojala | Vidas Daudaravicius | Martin Hassel | Dimitrios Kokkinakis | Heljä Lundgren-Laine | Gunnar Nilsson | Øystein Nytrø | Sanna Salanterä | Maria Skeppstedt | Hanna Suominen | Sumithra Velupillai
Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents

pdf bib
Uncertainty Detection as Approximate Max-Margin Sequence Labelling
Oscar Täckström | Sumithra Velupillai | Martin Hassel | Gunnar Eriksson | Hercules Dalianis | Jussi Karlgren
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf bib
Towards a better understanding of uncertainties and speculations in Swedish clinical text – Analysis of an initial annotation trial
Sumithra Velupillai
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

pdf bib
Levels of certainty in knowledge-intensive corpora: an initial annotation study
Aron Henriksson | Sumithra Velupillai
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

pdf bib
How Certain are Clinical Assessments? Annotating Swedish Clinical Text for (Un)certainties, Speculations and Negations
Hercules Dalianis | Sumithra Velupillai
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Clinical texts contain a large amount of information. Some of this information is embedded in contexts where e.g. a patient status is reasoned about, which may lead to a considerable amount of statements that indicate uncertainty and speculation. We believe that distinguishing such instances from factual statements will be very beneficial for automatic information extraction. We have annotated a subset of the Stockholm Electronic Patient Record Corpus for certain and uncertain expressions as well as speculative and negation keywords, with the purpose of creating a resource for the development of automatic detection of speculative language in Swedish clinical text. We have analyzed the results from the initial annotation trial by means of pairwise Inter-Annotator Agreement (IAA) measured with F-score. Our main findings are that IAA results for certain expressions and negations are very high, but for uncertain expressions and speculative keywords results are less encouraging. These instances need to be defined in more detail. With this annotation trial, we have created an important resource that can be used to further analyze the properties of speculative language in Swedish clinical text. Our intention is to release this subset to other research groups in the future after removing identifiable information.

2008

pdf bib
Automatic Construction of Domain-specific Dictionaries on Sparse Parallel Corpora in the Nordic languages
Sumithra Velupillai | Hercules Dalianis
Coling 2008: Proceedings of the workshop Multi-source Multilingual Information Extraction and Summarization

pdf bib
Mixing and Blending Syntactic and Semantic Dependencies
Yvonne Samuelsson | Oscar Täckström | Sumithra Velupillai | Johan Eklund | Mark Fishel | Markus Saers
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib
Revealing Relations between Open and Closed Answers in Questionnaires through Text Clustering Evaluation
Magnus Rosell | Sumithra Velupillai
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Open answers in questionnaires contain valuable information that is very time-consuming to analyze manually. We present a method for hypothesis generation from questionnaires based on text clustering. Text clustering is used interactively on the open answers, and the user can explore the cluster contents. The exploration is guided by automatic evaluation of the clusters against a closed answer regarded as a categorization. This simplifies the process of selecting interesting clusters. The user formulates a hypothesis from the relation between the cluster content and the closed answer categorization. We have applied our method on an open answer regarding occupation compared to a closed answer on smoking habits. With no prior knowledge of smoking habits in different occupation groups we have generated the hypothesis that farmers smoke less than the average. The hypothesis is supported by several separate surveys. Closed answers are easy to analyze automatically but are restricted and may miss valuable aspects. Open answers, on the other hand, fully capture the dynamics and diversity of possible outcomes. With our method the process of analyzing open answers becomes feasible.

2006

pdf bib
The impact of phrases in document clustering for Swedish
Magnus Rosell | Sumithra Velupillai
Proceedings of the 15th Nordic Conference of Computational Linguistics (NODALIDA 2005)