Iben Nyholm Debess

2025

pdf bib
What’s Wrong With This Translation? Simplifying Error Annotation For Crowd Evaluation
Iben Nyholm Debess | Alina Karakanta | Barbara Scalvini
Proceedings of the 1st Workshop on Nordic-Baltic Responsible Evaluation and Alignment of Language Models (NB-REAL 2025)

pdf bib abs
Rethinking Low-Resource MT: The Surprising Effectiveness of Fine-Tuned Multilingual Models in the LLM Age
Barbara Scalvini | Iben Nyholm Debess | Annika Simonsen | Hafsteinn Einarsson
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

This study challenges the current paradigm shift in machine translation, where large language models (LLMs) are gaining prominence over traditional neural machine translation models, with a focus on English-to-Faroese translation. We compare the performance of various models, including fine-tuned multilingual models, LLMs (GPT-SW3, Llama 3.1), and closed-source models (Claude 3.5, GPT-4). Our findings show that a fine-tuned NLLB model outperforms most LLMs, including some larger models, in both automatic and human evaluations. We also demonstrate the effectiveness of using LLM-generated synthetic data for fine-tuning. While closed-source models like Claude 3.5 perform best overall, the competitive performance of smaller, fine-tuned models suggests a more nuanced approach to low-resource machine translation. Our results highlight the potential of specialized multilingual models and the importance of language-specific knowledge. We discuss implications for resource allocation in low-resource settings and suggest future directions for improving low-resource machine translation, including targeted data creation and more comprehensive evaluation methodologies.

pdf bib abs
Prompt Engineering Enhances Faroese MT, but Only Humans Can Tell
Barbara Scalvini | Annika Simonsen | Iben Nyholm Debess | Hafsteinn Einarsson
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

This study evaluates GPT-4’s English-to-Faroese translation capabilities, comparing it with multilingual models on FLORES-200 and Sprotin datasets. We propose a prompt optimization strategy using Semantic Textual Similarity (STS) to improve translation quality. Human evaluation confirms the effectiveness of STS-based few-shot example selection, though automated metrics fail to capture these improvements. Our findings advance LLM applications for low-resource language translation while highlighting the need for better evaluation methods in this context.

pdf bib
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Špela Arhar Holdt | Nikolai Ilinykh | Barbara Scalvini | Micaella Bruton | Iben Nyholm Debess | Crina Madalina Tudor
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)

2024

pdf bib abs
Evaluating the Potential of Language-family-specific Generative Models for Low-resource Data Augmentation: A Faroese Case Study
Barbara Scalvini | Iben Nyholm Debess
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We investigate GPT-SW3, a generative language model for the Nordic languages, to assess its understanding of the low-resourced Faroese language. Our aim is to demonstrate the advantages of using language-family-specific generative models to augment data for related languages with fewer resources. We evaluate GPT-SW3 by prompting it for Faroese to English translation in a zero, one, and few-shot setting. We assess such translations with an ensemble score consisting of an arithmetic average between the BLEU and a semantic similarity score (SBERT). Moreover, we challenge the model’s Faroese language understanding capabilities on a small dataset of curated Faroese trick sentences. There, we make a qualitative comparison of the model’s performance with respect to Open AI’s GPT-3.5 and GPT-4, demonstrating the advantages of using a language-family-specific generative model for navigating non-trivial scenarios. We evaluate the pipeline thus created and use it, as a proof of concept, to create an automatically annotated Faroese semantic textual similarity (STS) dataset.

pdf bib abs
Good or Bad News? Exploring GPT-4 for Sentiment Analysis for Faroese on a Public News Corpora
Iben Nyholm Debess | Annika Simonsen | Hafsteinn Einarsson
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Sentiment analysis in low-resource languages presents unique challenges that Large Language Models may help address. This study explores the efficacy of GPT-4 for sentiment analysis on Faroese news texts, an uncharted task for this language. On the basis of guidelines presented, the sentiment analysis was performed with a multi-class approach at the sentence and document level with 225 sentences analysed in 170 articles. When comparing GPT-4 to human annotators, we observe that GPT-4 performs remarkably well. We explored two prompt configurations and observed a benefit from having clear instructions for the sentiment analysis task, but no benefit from translating the articles to English before the sentiment analysis task. Our results indicate that GPT-4 can be considered as a valuable tool for generating Faroese test data. Furthermore, our investigation reveals the intricacy of news sentiment. This motivates a more nuanced approach going forward, and we suggest a multi-label approach for future research in this domain. We further explored the efficacy of GPT-4 in topic classification on news texts and observed more negative sentiments expressed in international than national news. Overall, this work demonstrates GPT-4’s proficiency on a novel task and its utility for augmenting resources in low-data languages.

2022

pdf bib abs
Creating a Basic Language Resource Kit for Faroese
Annika Simonsen | Sandra Saxov Lamhauge | Iben Nyholm Debess | Peter Juel Henrichsen
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The biggest challenges we face in developing LR and LT for Faroese is the lack of existing resources. A few resources already exist for Faroese, but many of them are either of insufficient size and quality or are not easily accessible. Therefore, the Faroese ASR project, Ravnur, set out to make a BLARK for Faroese. The BLARK is still in the making, but many of its resources have already been produced or collected. The LR status is framed by mentioning existing LR of relevant size and quality. The specific components of the BLARK are presented as well as the working principles behind the BLARK. The BLARK will be a pillar in Faroese LR, being relatively substantial in both size, quality, and diversity. It will be open-source, inviting other small languages to use it as an inspiration to create their own BLARK. We comment on the faulty yet sprouting LT situation in the Faroe Islands. The LR and LT challenges are not solved with just a BLARK. Some initiatives are therefore proposed to better the prospects of Faroese LT. The open-source principle of the project should facilitate further development.

2019

pdf bib abs
Garnishing a phonetic dictionary for ASR intake
Iben Nyholm Debess | Sandra Saxov Lamhauge | Peter Juel Henrichsen
Proceedings of the 22nd Nordic Conference on Computational Linguistics

We present a new method for preparing a lexical-phonetic database as a resource for acoustic model training. The research is an offshoot of the ongoing Project Ravnur (Speech Recognition for Faroese), but the method is language-independent. At NODALIDA 2019 we demonstrate the method (called SHARP) online, showing how a traditional lexical-phonetic dictionary (with a very rich phone inventory) is transformed into an ASR-friendly database (with reduced phonetics, preventing data sparseness). The mapping procedure is informed by a corpus of speech transcripts. We conclude with a discussion on the benefits of a well-thought-out BLARK design (Basic Language Resource Kit), making tools like SHARP possible.

Co-authors

Crina Madalina Tudor 1

Venues

resourceful1

Fix data