Serguei Pakhomov

Also published as: Serguei V. Pakhomov

2025

Bigger But Not Better: Small Neural Language Models Outperform LLMs in Detection of Thought Disorder
Changye Li | Weizhe Xu | Serguei Pakhomov | Ellen Bradley | Dror Ben-Zeev | Trevor Cohen
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025)

Disorganized thinking is a key diagnostic indicator of schizophrenia-spectrum disorders. Recently, clinical estimates of the severity of disorganized thinking have been shown to correlate with measures of how difficult speech transcripts would be for large language models (LLMs) to predict. However, LLMs’ deployment challenges – including privacy concerns, computational and financial costs, and lack of transparency of training data – limit their clinical utility. We investigate whether smaller neural language models can serve as effective alternatives for detecting positive formal thought disorder, using the same sliding window based perplexity measurements that proved effective with larger models. Surprisingly, our results show that smaller models are more sensitive to linguistic differences associated with formal thought disorder than their larger counterparts. Detection capability declines beyond a certain model size and context length, challenging the common assumption of “bigger is better” for LLM-based applications. Our findings generalize across audio diaries and clinical interview speech samples from individuals with psychotic symptoms, suggesting a promising direction for developing efficient, cost-effective, and privacy-preserving screening tools that can be deployed in both clinical and naturalistic settings.

2024

pdf bib abs

Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies
Changye Li | Zhecheng Sheng | Trevor Cohen | Serguei Pakhomov
Findings of the Association for Computational Linguistics: ACL 2024

As artificial neural networks grow in complexity, understanding their inner workings becomes increasingly challenging, which is particularly important in healthcare applications. The intrinsic evaluation metrics of autoregressive neural language models (NLMs), perplexity (PPL), can reflect how “surprised” an NLM model is at novel input. PPL has been widely used to understand the behavior of NLMs. Previous findings show that changes in PPL when masking attention layers in pre-trained transformer-based NLMs reflect linguistic anomalies associated with Alzheimer’s disease dementia. Building upon this, we explore a novel bidirectional attention head ablation method that exhibits properties attributed to the concepts of cognitive and brain reserve in human brain studies, which postulate that people with more neurons in the brain and more efficient processing are more resilient to neurodegeneration. Our results show that larger GPT-2 models require a disproportionately larger share of attention heads to be masked/ablated to display degradation of similar magnitude to masking in smaller models. These results suggest that the attention mechanism in transformer models may present an analogue to the notions of cognitive and brain reserve and could potentially be used to model certain aspects of the progression of neurodegenerative disorders and aging.

2023

pdf bib abs

A Dialogue System for Assessing Activities of Daily Living: Improving Consistency with Grounded Knowledge
Zhecheng Sheng | Raymond Finzel | Michael Lucke | Sheena Dufresne | Maria Gini | Serguei Pakhomov
Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

In healthcare, the ability to care for oneself is reflected in the “Activities of Daily Living (ADL),” which serve as a measure of functional ability (functioning). A lack of functioning may lead to poor living conditions requiring personal care and assistance. To accurately identify those in need of support, assistance programs continuously evaluate participants’ functioning across various domains. However, the assessment process may encounter consistency issues when multiple assessors with varying levels of expertise are involved. Novice assessors, in particular, may lack the necessary preparation for real-world interactions with participants. To address this issue, we developed a dialogue system that simulates interactions between assessors and individuals of varying functioning in a natural and reproducible way. The dialogue system consists of two major modules, one for natural language understanding (NLU) and one for natural language generation (NLG), respectively. In order to generate responses consistent with the underlying knowledge base, the dialogue system requires both an understanding of the user’s query and of biographical details of an individual being simulated. To fulfill this requirement, we experimented with query classification and generated responses based on those biographical details using some recently released InstructGPT-like models.

2022

pdf bib abs

GPT-D: Inducing Dementia-related Linguistic Anomalies by Deliberate Degradation of Artificial Neural Language Models
Changye Li | David Knopman | Weizhe Xu | Trevor Cohen | Serguei Pakhomov
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Deep learning (DL) techniques involving fine-tuning large numbers of model parameters have delivered impressive performance on the task of discriminating between language produced by cognitively healthy individuals, and those with Alzheimer’s disease (AD). However, questions remain about their ability to generalize beyond the small reference sets that are publicly available for research. As an alternative to fitting model parameters directly, we propose a novel method by which a Transformer DL model (GPT-2) pre-trained on general English text is paired with an artificially degraded version of itself (GPT-D), to compute the ratio between these two models’ perplexities on language from cognitively healthy and impaired individuals. This technique approaches state-of-the-art performance on text data from a widely used “Cookie Theft” picture description task, and unlike established alternatives also generalizes well to spontaneous conversations. Furthermore, GPT-D generates text with characteristics known to be associated with AD, demonstrating the induction of dementia-related linguistic anomalies. Our study is a step toward better understanding of the relationships between the inner workings of generative neural language models, the language that they produce, and the deleterious effects of dementia on human speech and language characteristics.

2021

pdf bib abs

Everyday Living Artificial Intelligence Hub
Raymond Finzel | Esha Singh | Martin Michalowski | Maria Gini | Serguei Pakhomov
Proceedings of the Second Workshop on Data Science with Human in the Loop: Language Advances

We present the Everyday Living Artificial Intelligence (AI) Hub, a novel proof-of-concept framework for enhancing human health and wellbeing via a combination of tailored wear-able and Conversational Agent (CA) solutions for non-invasive monitoring of physiological signals, assessment of behaviors through unobtrusive wearable devices, and the provision of personalized interventions to reduce stress and anxiety. We utilize recent advancements and industry standards in the Internet of Things (IoT)and AI technologies to develop this proof-of-concept framework.

pdf bib abs

Conversational Agent for Daily Living Assessment Coaching Demo
Raymond Finzel | Aditya Gaydhani | Sheena Dufresne | Maria Gini | Serguei Pakhomov
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Conversational Agent for Daily Living Assessment Coaching (CADLAC) is a multi-modal conversational agent system designed to impersonate “individuals” with various levels of ability in activities of daily living (ADLs: e.g., dressing, bathing, mobility, etc.) for use in training professional assessors how to conduct interviews to determine one’s level of functioning. The system is implemented on the MindMeld platform for conversational AI and features a Bidirectional Long Short-Term Memory topic tracker that allows the agent to navigate conversations spanning 18 different ADL domains, a dialogue manager that interfaces with a database of over 10,000 historical ADL assessments, a rule-based Natural Language Generation (NLG) module, and a pre-trained open-domain conversational sub-agent (based on GPT-2) for handling conversation turns outside of the 18 ADL domains. CADLAC is delivered via state-of-the-art web frameworks to handle multiple conversations and users simultaneously and is enabled with voice interface. The paper includes a description of the system design and evaluation of individual components followed by a brief discussion of current limitations and next steps.

2020

pdf bib abs

A Tale of Two Perplexities: Sensitivity of Neural Language Models to Lexical Retrieval Deficits in Dementia of the Alzheimer’s Type
Trevor Cohen | Serguei Pakhomov
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In recent years there has been a burgeoning interest in the use of computational methods to distinguish between elicited speech samples produced by patients with dementia, and those from healthy controls. The difference between perplexity estimates from two neural language models (LMs) - one trained on transcripts of speech produced by healthy participants and one trained on those with dementia - as a single feature for diagnostic classification of unseen transcripts has been shown to produce state-of-the-art performance. However, little is known about why this approach is effective, and on account of the lack of case/control matching in the most widely-used evaluation set of transcripts (DementiaBank), it is unclear if these approaches are truly diagnostic, or are sensitive to other variables. In this paper, we interrogate neural LMs trained on participants with and without dementia by using synthetic narratives previously developed to simulate progressive semantic dementia by manipulating lexical frequency. We find that perplexity of neural LMs is strongly and differentially associated with lexical frequency, and that using a mixture model resulting from interpolating control and dementia LMs improves upon the current state-of-the-art for models trained on transcript text exclusively.

2017

pdf bib abs

What Analogies Reveal about Word Vectors and their Compositionality
Gregory Finley | Stephanie Farmer | Serguei Pakhomov
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Analogy completion via vector arithmetic has become a common means of demonstrating the compositionality of word embeddings. Previous work have shown that this strategy works more reliably for certain types of analogical word relationships than for others, but these studies have not offered a convincing account for why this is the case. We arrive at such an account through an experiment that targets a wide variety of analogy questions and defines a baseline condition to more accurately measure the efficacy of our system. We find that the most reliably solvable analogy categories involve either 1) the application of a morpheme with clear syntactic effects, 2) male–female alternations, or 3) named entities. These broader types do not pattern cleanly along a syntactic–semantic divide. We suggest instead that their commonality is distributional, in that the difference between the distributions of two words in any given pair encompasses a relatively small number of word types. Our study offers a needed explanation for why analogy tests succeed and fail where they do and provides nuanced insight into the relationship between word distributions and the theoretical linguistic domains of syntax and semantics.