Sidsel Boldsen

2025

TripleCheck: Transparent Post-Hoc Verification of Biomedical Claims in AI-Generated Answers
Ana Valeria González | Sidsel Boldsen | Roland Hangelbroek
Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP)

Retrieval Augmented Generation (RAG) has advanced Question Answering (QA) by connecting Large Language Models (LLMs) to external knowledge. However, these systems can still produce answers that are unsupported, lack clear traceability, or misattribute information — a critical issue in the biomedical domain where accuracy, trust and control are essential. We introduce TripleCheck, a post-hoc framework that breaks down an LLM’s answer into factual triples and checks each against both the retrieved context and a biomedical knowledge graph. By highlighting which statements are supported, traceable, or correctly attributed, TripleCheck enables users to spot gaps, unsupported claims, and misattributions, prompting more careful follow up. We present the TripleCheck framework, evaluate it on the SciFact benchmark, analyze its limitations, and share preliminary expert feedback. Results show that TripleCheck provides nuanced insight, potentially supporting greater trust and safer AI adoption in biomedical applications.

2023

pdf bib abs

The Hidden Folk: Linguistic Properties Encoded in Multilingual Contextual Character Representations
Manex Agirrezabal | Sidsel Boldsen | Nora Hollenstein
Proceedings of the Workshop on Computation and Written Language (CAWL 2023)

To gain a better understanding of the linguistic information encoded in character-based language models, we probe the multilingual contextual CANINE model. We design a range of phonetic probing tasks in six Nordic languages, including Faroese as an additional zero-shot instance. We observe that some phonetic information is indeed encoded in the character representations, as consonants and vowels can be well distinguished using a linear classifier. Furthermore, results for the Danish and Norwegian language seem to be worse for the consonant/vowel distinction in comparison to other languages. The information encoded in these representations can also be learned in a zero-shot scenario, as Faroese shows a reasonably good performance in the same vowel/consonant distinction task.

2022

pdf bib abs

Letters From the Past: Modeling Historical Sound Change Through Diachronic Character Embeddings
Sidsel Boldsen | Patrizia Paggio
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While a great deal of work has been done on NLP approaches to lexical semantic change detection, other aspects of language change have received less attention from the NLP community. In this paper, we address the detection of sound change through historical spelling. We propose that a sound change can be captured by comparing the relative distance through time between the distributions of the characters involved before and after the change has taken place. We model these distributions using PPMI character embeddings. We verify this hypothesis in synthetic data and then test the method’s ability to trace the well-known historical change of lenition of plosives in Danish historical sources. We show that the models are able to identify several of the changes under consideration and to uncover meaningful contexts in which they appeared. The methodology has the potential to contribute to the study of open questions such as the relative chronology of sound shifts and their geographical distribution.

pdf bib abs

Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color
Sidsel Boldsen | Manex Agirrezabal | Nora Hollenstein
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Character-level information is included in many NLP models, but evaluating the information encoded in character representations is an open issue. We leverage perceptual representations in the form of shape, sound, and color embeddings and perform a representational similarity analysis to evaluate their correlation with textual representations in five languages. This cross-lingual analysis shows that textual character representations correlate strongly with sound representations for languages using an alphabetic script, while shape correlates with featural scripts. We further develop a set of probing classifiers to intrinsically evaluate what phonological information is encoded in character embeddings. Our results suggest that information on features such as voicing are embedded in both LSTM and transformer-based representations.

2021

pdf bib abs

Survey and reproduction of computational approaches to dating of historical texts
Sidsel Boldsen | Fredrik Wahlberg
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Finding the year of writing for a historical text is of crucial importance to historical research. However, the year of original creation is rarely explicitly stated and must be inferred from the text content, historical records, and codicological clues. Given a transcribed text, machine learning has successfully been used to estimate the year of production. In this paper, we present an overview of several estimation approaches for historical text archives spanning from the 12th century until today.

2019

pdf bib abs

Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?
Sidsel Boldsen | Manex Agirrezabal | Patrizia Paggio
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.

pdf bib abs

The Seemingly (Un)systematic Linking Element in Danish
Sidsel Boldsen | Manex Agirrezabal
Proceedings of the 22nd Nordic Conference on Computational Linguistics

The use of a linking element between compound members is a common phenomenon in Germanic languages. Still, the exact use and conditioning of such elements is a disputed topic in linguistics. In this paper we address the issue of predicting the use of linking elements in Danish. Following previous research that shows how the choice of linking element might be conditioned by phonology, we frame the problem as a language modeling task: Considering the linking elements -s/-∅ the problem becomes predicting what is most probable to encounter next, a syllable boundary or the joining element, ‘s’. We show that training a language model on this task reaches an accuracy of 94 %, and in the case of an unsupervised model, the accuracy reaches 80%.

Co-authors

Fredrik Wahlberg 1

Venues

WS1

Fix author