Yvonne Adesam


2023

pdf bib
Superlim: A Swedish Language Understanding Evaluation Benchmark
Aleksandrs Berdicevskis | Gerlof Bouma | Robin Kurtz | Felix Morger | Joey Öhman | Yvonne Adesam | Lars Borin | Dana Dannélls | Markus Forsberg | Tim Isbister | Anna Lindahl | Martin Malmsten | Faton Rekathati | Magnus Sahlgren | Elena Volodina | Love Börjeson | Simon Hengchen | Nina Tahmasebi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

We present Superlim, a multi-task NLP benchmark and analysis platform for evaluating Swedish language models, a counterpart to the English-language (Super)GLUE suite. We describe the dataset, the tasks, the leaderboard and report the baseline results yielded by a reference implementation. The tested models do not approach ceiling performance on any of the tasks, which suggests that Superlim is truly difficult, a desirable quality for a benchmark. We address methodological challenges, such as mitigating the Anglocentric bias when creating datasets for a less-resourced language; choosing the most appropriate measures; documenting the datasets and making the leaderboard convenient and transparent. We also highlight other potential usages of the dataset, such as, for instance, the evaluation of cross-lingual transfer learning.

2021

pdf bib
Part-of-speech tagging of Swedish texts in the neural era
Yvonne Adesam | Aleksandrs Berdicevskis
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

We train and test five open-source taggers, which use different methods, on three Swedish corpora, which are of comparable size but use different tagsets. The KB-Bert tagger achieves the highest accuracy for part-of-speech and morphological tagging, while being fast enough for practical use. We also compare the performance across tagsets and across different genres in one of the corpora. We perform manual error analysis and perform a statistical analysis of factors which affect how difficult specific tags are. Finally, we test ensemble methods, showing that a small (but not significant) improvement over the best-performing tagger can be achieved.

pdf bib
The Swedish Winogender Dataset
Saga Hansson | Konstantinos Mavromatakis | Yvonne Adesam | Gerlof Bouma | Dana Dannélls
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

We introduce the SweWinogender test set, a diagnostic dataset to measure gender bias in coreference resolution. It is modelled after the English Winogender benchmark, and is released with reference statistics on the distribution of men and women between occupations and the association between gender and occupation in modern corpus material. The paper discusses the design and creation of the dataset, and presents a small investigation of the supplementary statistics.

2020

pdf bib
Training a Swedish Constituency Parser on Six Incompatible Treebanks
Richard Johansson | Yvonne Adesam
Proceedings of the Twelfth Language Resources and Evaluation Conference

We investigate a transition-based parser that uses Eukalyptus, a function-tagged constituent treebank for Swedish which includes discontinuous constituents. In addition, we show that the accuracy of this parser can be improved by using a multitask learning architecture that makes it possible to train the parser on additional treebanks that use other annotation models.

2017

pdf bib
Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language
Gerlof Bouma | Yvonne Adesam
Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

2016

pdf bib
Old Swedish Part-of-Speech Tagging between Variation and External Knowledge
Yvonne Adesam | Gerlof Bouma
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
A Multi-domain Corpus of Swedish Word Sense Annotation
Richard Johansson | Yvonne Adesam | Gerlof Bouma | Karin Hedberg
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe the word sense annotation layer in Eukalyptus, a freely available five-domain corpus of contemporary Swedish with several annotation layers. The annotation uses the SALDO lexicon to define the sense inventory, and allows word sense annotation of compound segments and multiword units. We give an overview of the new annotation tool developed for this project, and finally present an analysis of the inter-annotator agreement between two annotators.

2015

pdf bib
Defining the Eukalyptus forest – the Koala treebank of Swedish
Yvonne Adesam | Gerlof Bouma | Richard Johansson
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

2014

pdf bib
Computer-aided morphology expansion for Old Swedish
Yvonne Adesam | Malin Ahlberg | Peter Andersson | Gerlof Bouma | Markus Forsberg | Mans Hulden
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we describe and evaluate a tool for paradigm induction and lexicon extraction that has been applied to Old Swedish. The tool is semi-supervised and uses a small seed lexicon and unannotated corpora to derive full inflection tables for input lemmata. In the work presented here, the tool has been modified to deal with the rich spelling variation found in Old Swedish texts. We also present some initial experiments, which are the first steps towards creating a large-scale morphology for Old Swedish.