Aparna Balagopalan


2020

pdf bib
Impact of ASR on Alzheimer’s Disease Detection: All Errors are Equal, but Deletions are More Equal than Others
Aparna Balagopalan | Ksenia Shkaruta | Jekaterina Novikova
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Automatic Speech Recognition (ASR) is a critical component of any fully-automated speech-based dementia detection model. However, despite years of speech recognition research, little is known about the impact of ASR accuracy on dementia detection. In this paper, we experiment with controlled amounts of artificially generated ASR errors and investigate their influence on dementia detection. We find that deletion errors affect detection performance the most, due to their impact on the features of syntactic complexity and discourse representation in speech. We show the trend to be generalisable across two different datasets for cognitive impairment detection. As a conclusion, we propose optimising the ASR to reflect a higher penalty for deletion errors in order to improve dementia detection performance.

pdf bib
Fantastic Features and Where to Find Them: Detecting Cognitive Impairment with a Subsequence Classification Guided Approach
Ben Eyre | Aparna Balagopalan | Jekaterina Novikova
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Despite the widely reported success of embedding-based machine learning methods on natural language processing tasks, the use of more easily interpreted engineered features remains common in fields such as cognitive impairment (CI) detection. Manually engineering features from noisy text is time and resource consuming, and can potentially result in features that do not enhance model performance. To combat this, we describe a new approach to feature engineering that leverages sequential machine learning models and domain knowledge to predict which features help enhance performance. We provide a concrete example of this method on a standard data set of CI speech and demonstrate that CI classification accuracy improves by 2.3% over a strong baseline when using features produced by this method. This demonstration provides an example of how this method can be used to assist classification in fields where interpretability is important, such as health care.

2019

pdf bib
Lexical Features Are More Vulnerable, Syntactic Features Have More Predictive Power
Jekaterina Novikova | Aparna Balagopalan | Ksenia Shkaruta | Frank Rudzicz
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Understanding the vulnerability of linguistic features extracted from noisy text is important for both developing better health text classification models and for interpreting vulnerabilities of natural language models. In this paper, we investigate how generic language characteristics, such as syntax or the lexicon, are impacted by artificial text alterations. The vulnerability of features is analysed from two perspectives: (1) the level of feature value change, and (2) the level of change of feature predictive power as a result of text modifications. We show that lexical features are more sensitive to text modifications than syntactic ones. However, we also demonstrate that these smaller changes of syntactic features have a stronger influence on classification performance downstream, compared to the impact of changes to lexical features. Results are validated across three datasets representing different text-classification tasks, with different levels of lexical and syntactic complexity of both conversational and written language.