Kenneth Enevoldsen


pdf bib
Detecting intersectionality in NER models: A data-driven approach
Ida Marie S. Lassen | Mina Almasi | Kenneth Enevoldsen | Ross Deans Kristensen-McLachlan
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

The presence of bias is a pressing concern for both engineers and users of language technology. What is less clear is how exactly bias can be measured, so as to rank models relative to the biases they display. Using an innovative experimental method involving data augmentation, we measure the effect of intersectional biases in Danish models used for Name Entity Recognition (NER). We quantify differences in representational biases, understood as a systematic difference in error or what is called error disparity. Our analysis includes both gender and ethnicity to illustrate the effect of multiple dimensions of bias, as well as experiments which look to move beyond a narrowly binary analysis of gender. We show that all contemporary Danish NER models perform systematically worse on non-binary and minority ethnic names, while not showing significant differences for typically Danish names. Our data augmentation technique can be applied on other languages to test for biases which might be relevant for researchers applying NER models to the study of cultural heritage data.

pdf bib
DanSumT5: Automatic Abstractive Summarization for Danish
Sara Kolding | Katrine Nymann | Ida Hansen | Kenneth Enevoldsen | Ross Kristensen-McLachlan
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Automatic abstractive text summarization is a challenging task in the field of natural language processing. This paper presents a model for domain-specific sum marization for Danish news articles, Dan SumT5; an mT5 model fine-tuned on a cleaned subset of the DaNewsroom dataset consisting of abstractive summary-article pairs. The resulting state-of-the-art model is evaluated both quantitatively and qualitatively, using ROUGE and BERTScore metrics and human rankings of the summaries. We find that although model refinements increase quantitative and qualitative performance, the model is still prone to factual errors. We discuss the limitations of current evaluation methods for automatic abstractive summarization and underline the need for improved metrics and transparency within the field. We suggest that future work should employ methods for detecting and reducing errors in model output and methods for referenceless evaluation of summaries.