Emilia Apostolova


2019

pdf bib
Combining Structured and Free-text Electronic Medical Record Data for Real-time Clinical Decision Support
Emilia Apostolova | Tony Wang | Tim Tschampel | Ioannis Koutroulis | Tom Velez
Proceedings of the 18th BioNLP Workshop and Shared Task

The goal of this work is to utilize Electronic Medical Record (EMR) data for real-time Clinical Decision Support (CDS). We present a deep learning approach to combining in real time available diagnosis codes (ICD codes) and free-text notes: Patient Context Vectors. Patient Context Vectors are created by averaging ICD code embeddings, and by predicting the same from free-text notes via a Convolutional Neural Network. The Patient Context Vectors were then simply appended to available structured data (vital signs and lab results) to build prediction models for a specific condition. Experiments on predicting ARDS, a rare and complex condition, demonstrate the utility of Patient Context Vectors as a means of summarizing the patient history and overall condition, and improve significantly the prediction model results.

2018

pdf bib
Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data
R. Andrew Kreek | Emilia Apostolova
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

Industry datasets used for text classification are rarely created for that purpose. In most cases, the data and target predictions are a by-product of accumulated historical data, typically fraught with noise, present in both the text-based document, as well as in the targeted labels. In this work, we address the question of how well performance metrics computed on noisy, historical data reflect the performance on the intended future machine learning model input. The results demonstrate the utility of dirty training datasets used to build prediction models for cleaner (and different) prediction inputs.

2017

pdf bib
Toward Automated Early Sepsis Alerting: Identifying Infection Patients from Nursing Notes
Emilia Apostolova | Tom Velez
BioNLP 2017

Severe sepsis and septic shock are conditions that affect millions of patients and have close to 50% mortality rate. Early identification of at-risk patients significantly improves outcomes. Electronic surveillance tools have been developed to monitor structured Electronic Medical Records and automatically recognize early signs of sepsis. However, many sepsis risk factors (e.g. symptoms and signs of infection) are often captured only in free text clinical notes. In this study, we developed a method for automatic monitoring of nursing notes for signs and symptoms of infection. We utilized a creative approach to automatically generate an annotated dataset. The dataset was used to create a Machine Learning model that achieved an F1-score ranging from 79 to 96%.

2015

pdf bib
Digital Leafleting: Extracting Structured Data from Multimedia Online Flyers
Emilia Apostolova | Payam Pourashraf | Jeffrey Sack
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Combining Visual and Textual Features for Information Extraction from Online Flyers
Emilia Apostolova | Noriko Tomuro
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2012

pdf bib
Domain Adaptation of Coreference Resolution for Radiology Reports
Emilia Apostolova | Noriko Tomuro | Pattanasak Mongkolwat | Dina Demner-Fushman
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

2011

pdf bib
Automatic Extraction of Lexico-Syntactic Patterns for Detection of Negation and Speculation Scopes
Emilia Apostolova | Noriko Tomuro | Dina Demner-Fushman
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Djangology: A Light-weight Web-based Tool for Distributed Collaborative Text Annotation
Emilia Apostolova | Sean Neilan | Gary An | Noriko Tomuro | Steven Lytinen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Manual text annotation is a resource-consuming endeavor necessary for NLP systems when they target new tasks or domains for which there are no existing annotated corpora. Distributing the annotation work across multiple contributors is a natural solution to reduce and manage the effort required. Although there are a few publicly available tools which support distributed collaborative text annotation, most of them have complex user interfaces and require a significant amount of involvement from the annotators/contributors as well as the project developers and administrators. We present a light-weight web application for highly distributed annotation projects - Djangology. The application takes advantage of the recent advances in web framework architecture that allow rapid development and deployment of web applications thus minimizing development time for customization. The application's web-based interface gives project administrators the ability to easily upload data, define project schemas, assign annotators, monitor progress, and review inter-annotator agreement statistics. The intuitive web-based user interface encourages annotator participation as contributors are not burdened by tool manuals, local installation, or configuration. The system has achieved a user response rate of 70% in two annotation projects involving more than 250 medical experts from various geographic locations.

pdf bib
Exploring Surface-Level Heuristics for Negation and Speculation Discovery in Clinical Texts
Emilia Apostolova | Noriko Tomuro
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing

2009

pdf bib
Towards Automatic Image Region Annotation - Image Region Textual Coreference Resolution
Emilia Apostolova | Dina Demner-Fushman
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers