Silviu-Florin Gheorghe


2026

The basic underlying assumption of machine learning (ML) models is that the training and test data are sampled from the same distribution. However, in daily practice, this assumption is often broken, i.e. the distribution of the test data changes over time, which hinders the application of conventional ML models. One domain where the distribution shift naturally occurs is text classification, since people always find new topics to discuss. To this end, we survey research articles studying open-set text classification and related tasks. We divide the methods in this area based on the constraints that define the kind of distribution shift and the corresponding problem formulation, i.e. learning with the Universum, zero-shot learning, and open-set learning. We next discuss the predominant mitigation approaches for each problem setup. We further identify several future work directions, aiming to push the boundaries beyond the state of the art. Finally, we explain how continual learning can solve many of the issues caused by the shifting class distribution. We maintain a list of relevant papers at https://github.com/Eduard6421/Open-Set-Survey.

2025

This study investigates the application of Natural Language Processing (NLP) methods to uncover linguistic and stylistic variations within the corpus of Ludwig Wittgenstein, a philosopher renowned for his complex and notional contributions. By analyzing works such as Tractatus Logico-Philosophicus alongside his later notes, manuscripts, and student-dictated lectures in Cambridge, we aim to identify significant distinctions in language use and conceptual framing. The corpus poses unique difficulties because of its diverse origins, encompassing published works, personal notes, and collaboratively edited transcripts. Utilizing zero-shot NLP techniques, this exploratory/preliminary research aims to reveal patterns reflective of Wittgenstein’s philosophical evolution and differences in text production manners. The results highlight the potential of computational approaches to enhance our understanding of complex, context-dependent philosophical writings, providing a possible path for further interdisciplinary investigations into linguistic and conceptual dynamics in this challenging body of work.