Jens Bjerring-Hansen

2026

Speaking on Their Behalf: Detecting Indirect Speech in Historical Danish and Norwegian Texts
Ali Al-Laith | Alexander Conroy | Kirstine Degn | Jens Bjerring-Hansen | Daniel Hershcovich
Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026

Indirect speech is a fundamental yet understudied form of reported speech that plays a crucial role in literary texts and communication. While direct speech detection has received significant attention in computational linguistics, the automatic identification of indirect speech remains a challenge due to its nuanced linguistic structure and contextual dependencies. This paper focuses on the detection of indirect speech in late 19th-century Scandinavian literature, where its presence has been linked to shifting aesthetic ideals. We present an annotated dataset of 150 segments, each randomly selected from 150 different novels, designed to capture indirect speech in Danish and Norwegian literature. We evaluate four pre-trained language models for classifying indirect speech, with results showing that a Danish Foundation Model (DFM Large), trained on extensive Danish data, has the highest performance. Finally, we conduct a classifier-assisted quantitative corpus analysis and find that the prevalence of indirect speech exhibits fluctuations over time.

pdf bib abs

From Sentiment to Interpretation: Teaching NLP for Literary Understanding Across Educational Contexts
Karl-Emil Kjær Bilstrup | Kirstine Nielsen Degn | Morten Schultz | Alexander Conroy | Jens Bjerring-Hansen | Daniel Hershcovich
Proceedings of the Seventh Workshop on Teaching Natural Language Processing (TeachNLP 2026)

We developed Litteraturmaskinen, a graphical annotation and exploration interface that enables students to collaborate on labeling sentiment in literary passages, comparing their decisions with model predictions, and justifying their interpretations. We deployed the system in two educational settings: A university module on computational literary studies and regular teaching by two first-language high school teachers. Based on observations, collected teaching plans, and interviews, we find that tensions between epistemic and academic traditions are both a barrier for integration and a productive entry point for literary reflection and argumentation. We conclude with recommendations for integrating NLP into literature and first-language curricula.

2025

pdf bib abs

Dying or Departing? Euphemism Detection for Death Discourse in Historical Texts
Ali Al-Laith | Alexander Conroy | Jens Bjerring-Hansen | Bolette Pedersen | Carsten Levisen | Daniel Hershcovich
Proceedings of the 31st International Conference on Computational Linguistics

Euphemisms are a linguistic device used to soften discussions of sensitive or uncomfortable topics, with death being a prominent example. In this paper, we present a study on the detection of death-related euphemisms in historical literary texts from a corpus containing Danish and Norwegian novels from the late 19th century. We introduce an annotated dataset of euphemistic and literal references to death, including both common and rare euphemisms, ranging from well-established terms to more culturally nuanced expressions. We evaluate the performances of state-of-the-art pre-trained language models fine-tuned for euphemism detection. Our findings show that fixed, literal expressions of death became less frequent over time, while metaphorical euphemisms grew in prevalence. Additionally, euphemistic language was more common in historical novels, whereas contemporary novels tended to refer to death more literally, reflecting the rise of secularism. These results shed light on the shifting discourse on death during a period when the concept of death as final became prominent.

pdf bib abs

Annotating and Classifying Direct Speech in Historical Danish and Norwegian Literary Texts
Ali Al-Laith | Alexander Conroy | Kirstine Nielsen Degn | Jens Bjerring-Hansen | Daniel Hershcovich
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Analyzing direct speech in historical literary texts provides insights into character dynamics, narrative style, and discourse patterns. In late 19th century Danish and Norwegian fiction direct speech reflects characters’ social and geographical backgrounds. However, inconsistent typographic conventions in Scandinavian literature complicate computational methods for distinguishing direct speech from other narrative elements. To address this, we introduce an annotated dataset from the MeMo corpus, capturing speech markers and tags in Danish and Norwegian novels. We evaluate pre-trained language models for classifying direct speech, with results showing that a Danish Foundation Model (DFM), trained on extensive Danish data, has the highest performance. Finally, we conduct a classifier-assisted quantitative corpus analysis and find a downward trend in the prevalence of speech over time.

2024

pdf bib abs

Noise, Novels, Numbers. A Framework for Detecting and Categorizing Noise in Danish and Norwegian Literature
Ali Al-Laith | Daniel Hershcovich | Jens Bjerring-Hansen | Jakob Ingemann Parby | Alexander Conroy | Timothy R Tangherlini
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

We present a framework for detecting and categorizing noise in literary texts, demonstrated through its application to Danish and Norwegian literature from the late 19-th century. Noise, understood as “aberrant sonic behaviour,” is not only an auditory phenomenon but also a cultural construct tied to the processes of civilization and urbanization.We begin by utilizing topic modeling techniques to identify noise-related documents, followed by fine-tuning BERT-based language models trained on Danish and Norwegian texts to analyze a corpus of over 800 novels.We identify and track the prevalence of noise in these texts, offering insights into the literary perceptions of noise during the Scandinavian “Modern Breakthrough” period (1870-1899). Our contributions include the development of a comprehensive dataset annotated for noise-related segments and their categorization into human-made, non-human-made, and musical noises. This study illustrates the framework’s potential for enhancing the understanding of the relationship between noise and its literary representations, providing a deeper appreciation of the auditory elements in literary works, including as sources for cultural history.

pdf bib abs

Development and Evaluation of Pre-trained Language Models for Historical Danish and Norwegian Literary Texts
Ali Al-Laith | Alexander Conroy | Jens Bjerring-Hansen | Daniel Hershcovich
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We develop and evaluate the first pre-trained language models specifically tailored for historical Danish and Norwegian texts. Three models are trained on a corpus of 19th-century Danish and Norwegian literature: two directly on the corpus with no prior pre-training, and one with continued pre-training. To evaluate the models, we utilize an existing sentiment classification dataset, and additionally introduce a new annotated word sense disambiguation dataset focusing on the concept of fate. Our assessment reveals that the model employing continued pre-training outperforms the others in two downstream NLP tasks on historical texts. Specifically, we observe substantial improvement in sentiment classification and word sense disambiguation compared to models trained on contemporary texts. These results highlight the effectiveness of continued pre-training for enhancing performance across various NLP tasks in historical text analysis.

2023

pdf bib abs

Sentiment Classification of Historical Danish and Norwegian Literary Texts
Ali Al-Laith | Kirstine Nielsen Degn | Alexander Conroy | Bolette Sandford Pedersen | Jens Bjerring-Hansen | Daniel Hershcovich
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Sentiment classification is valuable for literary analysis, as sentiment is crucial in literary narratives. It can, for example, be used to investigate a hypothesis in the literary analysis of 19th-century Scandinavian novels that the writing of female authors in this period was characterized by negative sentiment, as this paper shows. In order to enable a data-driven analysis of this hypothesis, we create a manually annotated dataset of sentence-level sentiment annotations for novels from this period and use it to train and evaluate various sentiment classification methods. We find that pre-trained multilingual language models outperform models trained on modern Danish, as well as classifiers based on lexical resources. Finally, in classifier-assisted corpus analysis, we confirm the literary hypothesis regarding the author’s gender and further shed light on the temporal development of the trend. Our dataset and trained models will be useful for future analysis of historical Danish and Norwegian literary texts.

Co-authors

Karl-Emil Kjær Bilstrup 1

Kirstine Degn 1

Carsten Levisen 1

Jakob Ingemann Parby 1

Morten Schultz 1

Timothy R Tangherlini 1

Venues

LREC1

TeachingNLP1

Fix author