Veronika Vincze

2024

Predictive and Distinctive Linguistic Features in Schizophrenia-Bipolar Spectrum Disorders
Martina Katalin Szabó | Veronika Vincze | Bernadett Dam | Csenge Guba | Anita Bagi | István Szendi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this study, we analyze spontaneous speech transcripts from Hungarian patients with schizophrenia, schizoaffective, and bipolar disorders. Our goal is to identify distinctive linguistic features in these patient groups and controls. To our knowledge, no prior study has systematically examined the linguistic features of these disorders or explored their use in distinguishing between these patient groups. We collected recordings from 77 participants during three directed spontaneous speech tasks in a clinical setting, resulting in 458 texts. Our research group manually transcribed the recordings. We processed the written corpus texts using Natural Language Processing methods and tools. The final corpus consists of 179,515 tokens, excluding punctuation. Using this data, we analyze different linguistic features’ predictive power by computing and comparing their frequency distributions. We then attempt to automatically differentiate between patient groups and controls using our extensive set of linguistic features, employing the random forest algorithm in these experiments. Our results indicate that applying machine learning techniques based on distinctive features can effectively distinguish SZ, SAD, BD, and controls, surpassing baseline results.

2023

We present version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced.

2022

pdf bib abs

In this article, we seek to automatically identify Hungarian patients suffering from mild cognitive impairment (MCI) or mild Alzheimer disease (mAD) based on their speech transcripts, focusing only on linguistic features. In addition to the features examined in our earlier study, we introduce syntactic, semantic, and pragmatic features of spontaneous speech that might affect the detection of dementia. In order to ascertain the most useful features for distinguishing healthy controls, MCI patients, and mAD patients, we carry out a statistical analysis of the data and investigate the significance level of the extracted features among various speaker group pairs and for various speaking tasks. In the second part of the article, we use this rich feature set as a basis for an effective discrimination among the three speaker groups. In our machine learning experiments, we analyze the efficacy of each feature group separately. Our model that uses all the features achieves competitive scores, either with or without demographic information (3-class accuracy values: 68%–70%, 2-class accuracy values: 77.3%–80%). We also analyze how different data recording scenarios affect linguistic features and how they can be productively used when distinguishing MCI patients from healthy controls.

Veronika Vincze

2024

2023

2022

2020

2018

2017

2016

2014

2013

2012

2011

2010

2008

Co-authors

Venues