Sophie Fellenz

2026

Continual Neural Topic Model
Charu Karakkaparambil James | Waleed Mustafa | Marcio Monteiro | Marius Kloft | Sophie Fellenz
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

In continual learning, our aim is to learn a new task without forgetting what was learned previously. In topic models, this translates to learning new topic models without forgetting previously learned topics. Previous work either considered Dynamic Topic Models (DTMs), which learn the evolution of topics based on the entire training corpus at once, or Online Topic Models, which are updated continuously based on new data but do not have long-term memory. To fill this gap, we propose the Continual Neural Topic Model (CoNTM), which continuously learns topic models at subsequent time steps without forgetting what was previously learned. This is achieved using a global prior distribution that is continuously updated. In our experiments, CoNTM consistently outperformed the dynamic topic model in terms of topic quality and predictive perplexity while being able to capture topic changes online. The analysis reveals that CoNTM can learn more diverse topics and better capture temporal changes than existing methods.

pdf bib abs

Semantic Echo Pathways (SEP): Tracing How Medical Language Propagates and Transforms
Charu Karakkaparambil James | Marcio Monteiro | Sophie Fellenz
Proceedings of the 1st Workshop on Linguistic Analysis for Health (HeaLing 2026)

We introduce Semantic Echo Pathways (SEP), a new approach for modeling the cross-domain evolution of medical language. Using continual neural topic models (CoNTM) trained separately on scientific literature, clinical notes, and public health-related data, we track linguistic drift and identify points where concepts change meaning. We propose three novel metrics: Cross-Domain Drift Score, Temporal Echo Lag, and Semantic Mutation Patterns to quantify how medical language travels between the scientific, clinical, and public domain. Applications to evolving concepts such as "long COVID", diagnostic category changes reveal previously undocumented patterns of medical-semantic evolution. Our results bridge computational modeling with the human-centered perspectives of medical humanities, offering clear, domain-aware maps of how medical language shifts across time and domains, and combining quantitative analysis with linguistic and clinical insight.

2025

pdf bib abs

Challenging Assumptions in Learning Generic Text Style Embeddings
Phil Ostheimer | Marius Kloft | Sophie Fellenz
The Sixth Workshop on Insights from Negative Results in NLP

Recent advancements in language representation learning primarily emphasize language modeling for deriving meaningful representations, often neglecting style-specific considerations. This study addresses this gap by creating generic, sentence-level style embeddings crucial for style-centric tasks. Our approach is grounded on the premise that low-level text style changes can compose any high-level style. We hypothesize that applying this concept to representation learning enables the development of versatile text style embeddings. By fine-tuning a general-purpose text encoder using contrastive learning and standard cross-entropy loss, we aim to capture these low-level style shifts, anticipating that they offer insights applicable to high-level text styles. The outcomes prompt us to reconsider the underlying assumptions as the results do not always show that the learned style representations capture high-level text styles.

pdf bib abs

Tethering Broken Themes: Aligning Neural Topic Models with Labels and Authors
Mayank Nagda | Phil Ostheimer | Sophie Fellenz
Findings of the Association for Computational Linguistics: NAACL 2025

Topic models are a popular approach for extracting semantic information from large document collections. However, recent studies suggest that the topics generated by these models often do not align well with human intentions. Although metadata such as labels and authorship information are available, it has not yet been effectively incorporated into neural topic models. To address this gap, we introduce FANToM, a novel method to align neural topic models with both labels and authorship information. FANToM allows for the inclusion of this metadata when available, producing interpretable topics and author distributions for each topic. Our approach demonstrates greater expressiveness than conventional topic models by learning the alignment between labels, topics, and authors. Experimental results show that FANToM improves existing models in terms of both topic quality and alignment. Additionally, it identifies author interests and similarities.

pdf bib abs

BBPOS: BERT-based Part-of-Speech Tagging for Uzbek
Latofat Bobojonova | Arofat Akhundjanova | Phil Sidney Ostheimer | Sophie Fellenz
Proceedings of the First Workshop on Language Models for Low-Resource Languages

This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91% average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers.

2024

pdf bib abs

Characterizing Text Datasets with Psycholinguistic Features
Marcio Monteiro | Charu Karakkaparambil James | Marius Kloft | Sophie Fellenz
Findings of the Association for Computational Linguistics: EMNLP 2024

Fine-tuning pretrained language models on task-specific data is a common practice in Natural Language Processing (NLP) applications. However, the number of pretrained models available to choose from can be very large, and it remains unclear how to select the optimal model without spending considerable amounts of computational resources, especially for the text domain. To address this problem, we introduce PsyMatrix, a novel framework designed to efficiently characterize text datasets. PsyMatrix evaluates multiple dimensions of text and discourse, producing interpretable, low-dimensional embeddings. Our framework has been tested using a meta-dataset repository that includes the performance of 24 pretrained large language models fine-tuned across 146 classification datasets. Using the proposed embeddings, we successfully developed a meta-learning system capable of recommending the most effective pretrained models (optimal and near-optimal) for fine-tuning on new datasets.

pdf bib abs

Text Style Transfer Evaluation Using Large Language Models
Phil Ostheimer | Mayank Nagda | Marius Kloft | Sophie Fellenz
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Evaluating Text Style Transfer (TST) is a complex task due to its multi-faceted nature. The quality of the generated text is measured based on challenging factors, such as style transfer accuracy, content preservation, and overall fluency. While human evaluation is considered to be the gold standard in TST assessment, it is costly and often hard to reproduce. Therefore, automated metrics are prevalent in these domains. Nonetheless, it is uncertain whether and to what extent these automated metrics correlate with human evaluations. Recent strides in Large Language Models (LLMs) have showcased their capacity to match and even exceed average human performance across diverse, unseen tasks. This suggests that LLMs could be a viable alternative to human evaluation and other automated metrics in TST evaluation. We compare the results of different LLMs in TST evaluation using multiple input prompts. Our findings highlight a strong correlation between (even zero-shot) prompting and human evaluation, showing that LLMs often outperform traditional automated metrics. Furthermore, we introduce the concept of prompt ensembling, demonstrating its ability to enhance the robustness of TST evaluation. This research contributes to the ongoing efforts for more robust and diverse evaluation methods by standardizing and validating TST evaluation with LLMs.

pdf bib abs

Evaluating Dynamic Topic Models
Charu Karakkaparambil James | Mayank Nagda | Nooshin Haji Ghassemi | Marius Kloft | Sophie Fellenz
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

There is a lack of quantitative measures to evaluate the progression of topics through time in dynamic topic models (DTMs). Filling this gap, we propose a novel evaluation measure for DTMs that analyzes the changes in the quality of each topic over time. Additionally, we propose an extension combining topic quality with the model’s temporal consistency. We demonstrate the utility of the proposed measure by applying it to synthetic data and data from existing DTMs, including DTMs from large language models (LLMs). We also show that the proposed measure correlates well with human judgment. Our findings may help in identifying changing topics, evaluating different DTMs and LLMs, and guiding future research in this area.

2023

pdf bib abs

A Call for Standardization and Validation of Text Style Transfer Evaluation
Phil Ostheimer | Mayank Nagda | Marius Kloft | Sophie Fellenz
Findings of the Association for Computational Linguistics: ACL 2023

Text Style Transfer (TST) evaluation is, in practice, inconsistent. Therefore, we conduct a meta-analysis on human and automated TST evaluation and experimentation that thoroughly examines existing literature in the field. The meta-analysis reveals a substantial standardization gap in human and automated evaluation. In addition, we also find a validation gap: only few automated metrics have been validated using human experiments. To this end, we thoroughly scrutinize both the standardization and validation gap and reveal the resulting pitfalls. This work also paves the way to close the standardization and validation gap in TST evaluation by calling out requirements to be met by future research.

Co-authors

Arofat Akhundjanova 1

Latofat Bobojonova 1

Nooshin Haji Ghassemi 1

Waleed Mustafa 1

Venues