Nina Hosseini-Kivanani

2025

pdf bib abs
Voices of Luxembourg: Tackling Dialect Diversity in a Low-Resource Setting
Nina Hosseini-Kivanani | Christoph Schommer | Peter Gilles
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)

Dialect classification is essential for preserving linguistic diversity, particularly in low-resource languages such as Luxembourgish. This study introduces one of the first systematic approaches to classifying Luxembourgish dialects, addressing phonetic, prosodic, and lexical variations across four major regions. We benchmarked multiple models, including state-of-the-art pre-trained speech models like Wav2Vec2, XLSR-Wav2Vec2, and Whisper, alongside traditional approaches such as Random Forest and CNN-LSTM. To overcome data limitations, we applied targeted data augmentation strategies and analyzed their impact on model performance. Our findings highlight the superior performance of CNN-Spectrogram and CNN-LSTM models while identifying the strengths and limitations of data augmentation. This work establishes foundational benchmarks and provides actionable insights for advancing dialectal NLP in Luxembourgish and other low-resource languages.

2024

pdf bib abs
Mapping Sentiments: A Journey into Low-Resource Luxembourgish Analysis
Nina Hosseini-Kivanani | Julien Kühn | Christoph Schommer
Proceedings of the First LUHME Workshop

Sentiment analysis (SA) plays a vital role in interpreting human opinions across different languages, especially in contexts like social media, product reviews, and other user-generated content. This study focuses on Luxembourgish, a low-resource language critical to Luxembourg’s identity, utilizing advanced deep learning models such as BERT, RoBERTa, LuxemBERTand LuxGPT-2. These models were enhanced with transfer learning, active learning strategies, and context-aware embeddings, enabling effective Luxembourgish processing. These models further improved with context-aware embeddings and were able to accurately detect sentiments, categorizing news comments into positive, negative, and neutral sentiments. Our approach highlights the significant role of human-in-the-loop (HITL) methodologies, which refine model accuracy by aligning automated analyses with human judgment. The findings indicate that LuxembBERT, especially when enhanced with the HITL method involving feedback from 500 and 1000 annotated sentences, outperforms other models in both binary (positive vs. negative) and multi-class (positive, neutral, and negative) classification tasks. The HITL approach not only refined model accuracy but also provided substantial improvements in understanding and processing sentiments and sarcasm, often challenging for automated systems. This study establishes the basis for future research to extend these methodologies to other underresourced languages, promising improvements in Natural Language Processing (NLP) applications across diverse linguistic landscapes.

pdf bib
Proceedings of the 1st Worskhop on Towards Ethical and Inclusive Conversational AI: Language Attitudes, Linguistic Diversity, and Language Rights (TEICAI 2024)
Nina Hosseini-Kivanani | Sviatlana Höhn | Dimitra Anastasiou | Bettina Migge | Angela Soltan | Doris Dippold | Ekaterina Kamlovskaya | Fred Philippy
Proceedings of the 1st Worskhop on Towards Ethical and Inclusive Conversational AI: Language Attitudes, Linguistic Diversity, and Language Rights (TEICAI 2024)

2019

pdf bib abs
Automated Cross-language Intelligibility Analysis of Parkinson’s Disease Patients Using Speech Recognition Technologies
Nina Hosseini-Kivanani | Juan Camilo Vásquez-Correa | Manfred Stede | Elmar Nöth
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Speech deficits are common symptoms amongParkinson’s Disease (PD) patients. The automatic assessment of speech signals is promising for the evaluation of the neurological state and the speech quality of the patients. Recently, progress has been made in applying machine learning and computational methods to automatically evaluate the speech of PD patients. In the present study, we plan to analyze the speech signals of PD patients and healthy control (HC) subjects in three different languages: German, Spanish, and Czech, with the aim to identify biomarkers to discriminate between PD patients and HC subjects and to evaluate the neurological state of the patients. Therefore, the main contribution of this study is the automatic classification of PD patients and HC subjects in different languages with focusing on phonation, articulation, and prosody. We will focus on an intelligibility analysis based on automatic speech recognition systems trained on these three languages. This is one of the first studies done that considers the evaluation of the speech of PD patients in different languages. The purpose of this research proposal is to build a model that can discriminate PD and HC subjects even when the language used for train and test is different.

Co-authors

Ekaterina Kamlovskaya 1

Juan Camilo Vásquez-Correa 1

Venues

Fix author