Marcio Monteiro
2026
Continual Neural Topic Model
Charu Karakkaparambil James | Waleed Mustafa | Marcio Monteiro | Marius Kloft | Sophie Fellenz
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Charu Karakkaparambil James | Waleed Mustafa | Marcio Monteiro | Marius Kloft | Sophie Fellenz
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
In continual learning, our aim is to learn a new task without forgetting what was learned previously. In topic models, this translates to learning new topic models without forgetting previously learned topics. Previous work either considered Dynamic Topic Models (DTMs), which learn the evolution of topics based on the entire training corpus at once, or Online Topic Models, which are updated continuously based on new data but do not have long-term memory. To fill this gap, we propose the Continual Neural Topic Model (CoNTM), which continuously learns topic models at subsequent time steps without forgetting what was previously learned. This is achieved using a global prior distribution that is continuously updated. In our experiments, CoNTM consistently outperformed the dynamic topic model in terms of topic quality and predictive perplexity while being able to capture topic changes online. The analysis reveals that CoNTM can learn more diverse topics and better capture temporal changes than existing methods.
Semantic Echo Pathways (SEP): Tracing How Medical Language Propagates and Transforms
Charu Karakkaparambil James | Marcio Monteiro | Sophie Fellenz
Proceedings of the 1st Workshop on Linguistic Analysis for Health (HeaLing 2026)
Charu Karakkaparambil James | Marcio Monteiro | Sophie Fellenz
Proceedings of the 1st Workshop on Linguistic Analysis for Health (HeaLing 2026)
We introduce Semantic Echo Pathways (SEP), a new approach for modeling the cross-domain evolution of medical language. Using continual neural topic models (CoNTM) trained separately on scientific literature, clinical notes, and public health-related data, we track linguistic drift and identify points where concepts change meaning. We propose three novel metrics: Cross-Domain Drift Score, Temporal Echo Lag, and Semantic Mutation Patterns to quantify how medical language travels between the scientific, clinical, and public domain. Applications to evolving concepts such as "long COVID", diagnostic category changes reveal previously undocumented patterns of medical-semantic evolution. Our results bridge computational modeling with the human-centered perspectives of medical humanities, offering clear, domain-aware maps of how medical language shifts across time and domains, and combining quantitative analysis with linguistic and clinical insight.
2024
Characterizing Text Datasets with Psycholinguistic Features
Marcio Monteiro | Charu Karakkaparambil James | Marius Kloft | Sophie Fellenz
Findings of the Association for Computational Linguistics: EMNLP 2024
Marcio Monteiro | Charu Karakkaparambil James | Marius Kloft | Sophie Fellenz
Findings of the Association for Computational Linguistics: EMNLP 2024
Fine-tuning pretrained language models on task-specific data is a common practice in Natural Language Processing (NLP) applications. However, the number of pretrained models available to choose from can be very large, and it remains unclear how to select the optimal model without spending considerable amounts of computational resources, especially for the text domain. To address this problem, we introduce PsyMatrix, a novel framework designed to efficiently characterize text datasets. PsyMatrix evaluates multiple dimensions of text and discourse, producing interpretable, low-dimensional embeddings. Our framework has been tested using a meta-dataset repository that includes the performance of 24 pretrained large language models fine-tuned across 146 classification datasets. Using the proposed embeddings, we successfully developed a meta-learning system capable of recommending the most effective pretrained models (optimal and near-optimal) for fine-tuning on new datasets.