Maria Skeppstedt


2022

pdf bib
A Digital Swedish-Yiddish/Yiddish-Swedish Dictionary: A Web-Based Dictionary that is also Available Offline
Magnus Ahltorp | Jean Hessel | Gunnar Eriksson | Maria Skeppstedt | Rickard Domeij
Proceedings of the Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference

Yiddish is one of the national minority languages of Sweden, and one of the languages for which the Swedish Institute for Language and Folklore is responsible for developing useful language resources. We here describe the web-based version of a Swedish-Yiddish/Yiddish-Swedish dictionary. The single search field of the web-based dictionary is used for incrementally searching all three components of the dictionary entries (the word in Swedish, the word in Yiddish with Hebrew characters and the transliteration in Latin script). When the user accesses the dictionary in an online mode, the dictionary is saved in the web browser, which makes it possible to also use the dictionary offline.

pdf bib
Converting from the Nordic Terminological Record Format to the TBX Format
Maria Skeppstedt | Marie Mattson | Magnus Ahltorp | Rickard Domeij
Proceedings of the Workshop on Terminology in the 21st century: many faces, many places

Rikstermbanken (Sweden’s National Term Bank), which was launched in 2009, uses the Nordic Terminological Record Format (NTRF) for organising its terminological data. Since then, new terminology formats have been established as standards, e.g., the Termbase eXchange format (TBX). We here describe work carried out by the Institute for Language and Folklore within the Federated eTranslation TermBank Network Action. This network develops a technical infrastructure for facilitating sharing of terminology resources throughout Europe. To be able to share some of the term collections of Rikstermbanken within this network and export them to Eurotermbank, we have implemented a conversion from the Nordic Terminological Record Format, as used in Rikstermbanken, to the TBX format.

2020

pdf bib
Line-a-line: A Tool for Annotating Word-Alignments
Maria Skeppstedt | Magnus Ahltorp | Gunnar Eriksson | Rickard Domeij
Proceedings of the 13th Workshop on Building and Using Comparable Corpora

We here describe line-a-line, a web-based tool for manual annotation of word-alignments in sentence-aligned parallel corpora. The graphical user interface, which builds on a design template from the Jigsaw system for investigative analysis, displays the words from each sentence pair that is to be annotated as elements in two vertical lists. An alignment between two words is annotated by drag-and-drop, i.e. by dragging an element from the left-hand list and dropping it on an element in the right-hand list. The tool indicates that two words are aligned by lines that connect them and by highlighting associated words when the mouse is hovered over them. Line-a-line uses the efmaral library for producing pre-annotated alignments, on which the user can base the manual annotation. The tool is mainly planned to be used on moderately under-resourced languages, for which resources in the form of parallel corpora are scarce. The automatic word-alignment functionality therefore also incorporates information derived from non-parallel resources, in the form of pre-trained multilingual word embeddings from the MUSE library.

2018

pdf bib
More or less controlled elicitation of argumentative text: Enlarging a microtext corpus via crowdsourcing
Maria Skeppstedt | Andreas Peldszus | Manfred Stede
Proceedings of the 5th Workshop on Argument Mining

We present an extension of an annotated corpus of short argumentative texts that had originally been built in a controlled text production experiment. Our extension more than doubles the size of the corpus by means of crowdsourcing. We report on the setup of this experiment and on the consequences that crowdsourcing had for assembling the data, and in particular for annotation. We labeled the argumentative structure by marking claims, premises, and relations between them, following the scheme used in the original corpus, but had to make a few modifications in response to interesting phenomena in the data. Finally, we report on an experiment with the automatic prediction of this argumentation structure: We first replicated the approach of an earlier study on the original corpus, and compare the performance to various settings involving the extension.

pdf bib
Stance-Taking in Topics Extracted from Vaccine-Related Tweets and Discussion Forum Posts
Maria Skeppstedt | Manfred Stede | Andreas Kerren
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

The occurrence of stance-taking towards vaccination was measured in documents extracted by topic modelling from two different corpora, one discussion forum corpus and one tweet corpus. For some of the topics extracted, their most closely associated documents contained a proportion of vaccine stance-taking texts that exceeded the corpus average by a large margin. These extracted document sets would, therefore, form a useful resource in a process for computer-assisted analysis of argumentation on the subject of vaccination.

pdf bib
Argumentation Synthesis following Rhetorical Strategies
Henning Wachsmuth | Manfred Stede | Roxanne El Baff | Khalid Al-Khatib | Maria Skeppstedt | Benno Stein
Proceedings of the 27th International Conference on Computational Linguistics

Persuasion is rarely achieved through a loose set of arguments alone. Rather, an effective delivery of arguments follows a rhetorical strategy, combining logical reasoning with appeals to ethics and emotion. We argue that such a strategy means to select, arrange, and phrase a set of argumentative discourse units. In this paper, we model rhetorical strategies for the computational synthesis of effective argumentation. In a study, we let 26 experts synthesize argumentative texts with different strategies for 10 topics. We find that the experts agree in the selection significantly more when following the same strategy. While the texts notably vary for different strategies, especially their arrangement remains stable. The results suggest that our model enables a strategical synthesis.

2017

pdf bib
Automatic detection of stance towards vaccination in online discussion forums
Maria Skeppstedt | Andreas Kerren | Manfred Stede
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

A classifier for automatic detection of stance towards vaccination in online forums was trained and evaluated. Debate posts from six discussion threads on the British parental website Mumsnet were manually annotated for stance ‘against’ or ‘for’ vaccination, or as ‘undecided’. A support vector machine, trained to detect the three classes, achieved a macro F-score of 0.44, while a macro F-score of 0.62 was obtained by the same type of classifier on the binary classification task of distinguishing stance ‘against’ vaccination from stance ‘for’ vaccination. These results show that vaccine stance detection in online forums is a difficult task, at least for the type of model investigated and for the relatively small training corpus that was used. Future work will therefore include an expansion of the training data and an evaluation of other types of classifiers and features.

2016

pdf bib
Unshared task: (Dis)agreement in online debates
Maria Skeppstedt | Magnus Sahlgren | Carita Paradis | Andreas Kerren
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib
Active learning for detection of stance components
Maria Skeppstedt | Magnus Sahlgren | Carita Paradis | Andreas Kerren
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

Automatic detection of five language components, which are all relevant for expressing opinions and for stance taking, was studied: positive sentiment, negative sentiment, speculation, contrast and condition. A resource-aware approach was taken, which included manual annotation of 500 training samples and the use of limited lexical resources. Active learning was compared to random selection of training data, as well as to a lexicon-based method. Active learning was successful for the categories speculation, contrast and condition, but not for the two sentiment categories, for which results achieved when using active learning were similar to those achieved when applying a random selection of training data. This difference is likely due to a larger variation in how sentiment is expressed than in how speakers express the other three categories. This larger variation was also shown by the lower recall results achieved by the lexicon-based approach for sentiment than for the categories speculation, contrast and condition.

2015

pdf bib
Expanding a dictionary of marker words for uncertainty and negation using distributional semantics
Alyaa Alfalahi | Maria Skeppstedt | Rickard Ahlbom | Roza Baskalayci | Aron Henriksson | Lars Asker | Carita Paradis | Andreas Kerren
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

pdf bib
Detecting speculations, contrasts and conditionals in consumer reviews
Maria Skeppstedt | Teri Schamp-Bjerede | Magnus Sahlgren | Carita Paradis | Andreas Kerren
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

2014

pdf bib
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)
Sumithra Velupillai | Martin Duneld | Maria Kvist | Hercules Dalianis | Maria Skeppstedt | Aron Henriksson
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

pdf bib
Medical text simplification using synonym replacement: Adapting assessment of word difficulty to a compounding language
Emil Abrahamsson | Timothy Forni | Maria Skeppstedt | Maria Kvist
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)

pdf bib
Enhancing Medical Named Entity Recognition with Features Derived from Unsupervised Methods
Maria Skeppstedt
Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
Annotating named entities in clinical text by combining pre-annotation and active learning
Maria Skeppstedt
51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop

pdf bib
Corpus-Driven Terminology Development: Populating Swedish SNOMED CT with Synonyms Extracted from Electronic Health Records
Aron Henriksson | Maria Skeppstedt | Maria Kvist | Martin Duneld | Mike Conway
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing

pdf bib
Adapting a Parser to Clinical Text by Simple Pre-processing Rules
Maria Skeppstedt
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing

pdf bib
Negation Scope Delimitation in Clinical Text Using Three Approaches: NegEx, PyConTextNLP and SynNeg
Hideyuki Tanushi | Hercules Dalianis | Martin Duneld | Maria Kvist | Maria Skeppstedt | Sumithra Velupillai
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

pdf bib
Rule-based Entity Recognition and Coverage of SNOMED CT in Swedish Clinical Text
Maria Skeppstedt | Maria Kvist | Hercules Dalianis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Named entity recognition of the clinical entities disorders, findings and body structures is needed for information extraction from unstructured text in health records. Clinical notes from a Swedish emergency unit were annotated and used for evaluating a rule- and terminology-based entity recognition system. This system used different preprocessing techniques for matching terms to SNOMED CT, and, one by one, four other terminologies were added. For the class body structure, the results improved with preprocessing, whereas only small improvements were shown for the classes disorder and finding. The best average results were achieved when all terminologies were used together. The entity body structure was recognised with a precision of 0.74 and a recall of 0.80, whereas lower results were achieved for disorder (precision: 0.75, recall: 0.55) and for finding (precision: 0.57, recall: 0.30). The proportion of entities containing abbreviations were higher for false negatives than for correctly recognised entities, and no entities containing more than two tokens were recognised by the system. Low recall for disorders and findings shows both that additional methods are needed for entity recognition and that there are many expressions in clinical text that are not included in SNOMED CT.

2011

pdf bib
The Impact of Part-of-Speech Filtering on Generation of a Swedish-Japanese Dictionary Using English as Pivot Language
Ingemar Hjälmstad | Martin Duneld | Maria Skeppstedt
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

2010

pdf bib
Negation Detection in Swedish Clinical Text
Maria Skeppstedt
Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents

pdf bib
Characteristics and Analysis of Finnish and Swedish Clinical Intensive Care Nursing Narratives
Helen Allvin | Elin Carlsson | Hercules Dalianis | Riitta Danielsson-Ojala | Vidas Daudaravicius | Martin Hassel | Dimitrios Kokkinakis | Heljä Lundgren-Laine | Gunnar Nilsson | Øystein Nytrø | Sanna Salanterä | Maria Skeppstedt | Hanna Suominen | Sumithra Velupillai
Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents

pdf bib
Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus
Hercules Dalianis | Maria Skeppstedt
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing