Bridget McInnes

Also published as: Bridget Thomson McInnes, Bridget T. McInnes

2020

NLP@VCU: Identifying Adverse Effects in English Tweets for Unbalanced Data
Darshini Mahendran | Cora Lewis | Bridget McInnes
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

This paper describes our participation in the Social Media Mining for Health Application (SMM4H 2020) Challenge Track 2 for identifying tweets containing Adverse Effects (AEs). Our system uses Convolutional Neural Networks. We explore downsampling, oversampling, and adjusting the class weights to account for the imbalanced nature of the dataset. Our results showed downsampling outperformed oversampling and adjusting the class weights on the test set however all three obtained similar results on the development set.

2019

pdf bib abs

NLP Whack-A-Mole: Challenges in Cross-Domain Temporal Expression Extraction
Amy Olex | Luke Maffey | Bridget McInnes
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Incorporating domain knowledge is vital in building successful natural language processing (NLP) applications. Many times, cross-domain application of a tool results in poor performance as the tool does not account for domain-specific attributes. The clinical domain is challenging in this aspect due to specialized medical terms and nomenclature, shorthand notation, fragmented text, and a variety of writing styles used by different medical units. Temporal resolution is an NLP task that, in general, is domain-agnostic because temporal information is represented using a limited lexicon. However, domain-specific aspects of temporal resolution are present in clinical texts. Here we explore parsing issues that arose when running our system, a tool built on Newswire text, on clinical notes in the THYME corpus. Many parsing issues were straightforward to correct; however, a few code changes resulted in a cascading series of parsing errors that had to be resolved before an improvement in performance was observed, revealing the complexity temporal resolution and rule-based parsing. Our system now outperforms current state-of-the-art systems on the THYME corpus with little change in its performance on Newswire texts.

2018

pdf bib abs

Chrono at SemEval-2018 Task 6: A System for Normalizing Temporal Expressions
Amy Olex | Luke Maffey | Nicholas Morgan | Bridget McInnes
Proceedings of the 12th International Workshop on Semantic Evaluation

Temporal information extraction is a challenging task. Here we describe Chrono, a hybrid rule-based and machine learning system that identifies temporal expressions in text and normalizes them into the SCATE schema. After minor parsing logic adjustments, Chrono has emerged as the top performing system for SemEval 2018 Task 6: Parsing Time Normalizations.

pdf bib abs

SciREL at SemEval-2018 Task 7: A System for Semantic Relation Extraction and Classification
Darshini Mahendran | Chathurika Brahmana | Bridget McInnes
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes our system, SciREL (Scientific abstract RELation extraction system), developed for the SemEval 2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers. We present a feature-vector based system to extract explicit semantic relation and classify them. Our system is trained in the ACL corpus (BIrd et al., 2008) that contains annotated abstracts given by the task organizers. When an abstract with annotated entities is given as the input into our system, it extracts the semantic relations through a set of defined features and classifies them into one of the given six categories of relations through feature engineering and a learned model. For the best combination of features, our system SciREL obtained an F-measure of 20.03 on the official test corpus which includes 150 abstracts in the relation classification Subtask 1.1. In this paper, we provide an in-depth error analysis of our results to prevent duplication of research efforts in the development of future systems

2017

pdf bib abs

Improving Correlation with Human Judgments by Integrating Semantic Similarity with Second–Order Vectors
Bridget McInnes | Ted Pedersen
Proceedings of the 16th BioNLP Workshop

Vector space methods that measure semantic similarity and relatedness often rely on distributional information such as co–occurrence frequencies or statistical measures of association to weight the importance of particular co–occurrences. In this paper, we extend these methods by incorporating a measure of semantic similarity based on a human curated taxonomy into a second–order vector representation. This results in a measure of semantic relatedness that combines both the contextual information available in a corpus–based vector space representation with the semantic knowledge found in a biomedical ontology. Our results show that incorporating semantic similarity into a second order co-occurrence matrices improves correlation with human judgments for both similarity and relatedness, and that our method compares favorably to various different word embedding methods that have recently been evaluated on the same reference standards we have used.

pdf bib abs

Evaluating Feature Extraction Methods for Knowledge-based Biomedical Word Sense Disambiguation
Sam Henry | Clint Cuffy | Bridget McInnes
Proceedings of the 16th BioNLP Workshop

In this paper, we present an analysis of feature extraction methods via dimensionality reduction for the task of biomedical Word Sense Disambiguation (WSD). We modify the vector representations in the 2-MRD WSD algorithm, and evaluate four dimensionality reduction methods: Word Embeddings using Continuous Bag of Words and Skip Gram, Singular Value Decomposition (SVD), and Principal Component Analysis (PCA). We also evaluate the effects of vector size on the performance of each of these methods. Results are evaluated on five standard evaluation datasets (Abbrev.100, Abbrev.200, Abbrev.300, NLM-WSD, and MSH-WSD). We find that vector sizes of 100 are sufficient for all techniques except SVD, for which a vector size of 1500 is referred. We also show that SVD performs on par with Word Embeddings for all but one dataset.