Magdalena Wolska


2023

pdf bib
Trigger Warning Assignment as a Multi-Label Document Classification Problem
Matti Wiegmann | Magdalena Wolska | Christopher Schröder | Ole Borchardt | Benno Stein | Martin Potthast
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A trigger warning is used to warn people about potentially disturbing content. We introduce trigger warning assignment as a multi-label classification task, create the Webis Trigger Warning Corpus 2022, and with it the first dataset of 1 million fanfiction works from Archive of our Own with up to 36 different warnings per document. To provide a reliable catalog of trigger warnings, we organized 41 million of free-form tags assigned by fanfiction authors into the first comprehensive taxonomy of trigger warnings by mapping them to the 36 institutionally recommended warnings. To determine the best operationalization of trigger warnings, we explore state-of-the-art multi-label models, examining the trade-off between assigning coarse- and fine-grained warnings, open- and closed-set classification, document length, and label confidence. Our models achieve micro-F1 scores of about 0.5, which reveals the difficulty of the task. Tailored representations, long input sequences, and a higher recall on rare warnings would help.

pdf bib
Trigger Warnings: Bootstrapping a Violence Detector for Fan Fiction
Magdalena Wolska | Matti Wiegmann | Christopher Schröder | Ole Borchardt | Benno Stein | Martin Potthast
Findings of the Association for Computational Linguistics: EMNLP 2023

We present the first dataset and evaluation results on a newly defined task: assigning trigger warnings. We introduce a labeled corpus of narrative fiction from Archive of Our Own (AO3), a popular fan fiction site, and define a document-level classification task to determine whether or not to assign a trigger warning to an English story. We focus on the most commonly assigned trigger type “violence’ using the warning labels provided by AO3 authors as ground-truth labels. We trained SVM, BERT, and Longfomer models on three datasets sampled from the corpus and achieve F1 scores between 0.8 and 0.9, indicating that assigning trigger warnings for violence is feasible.

2022

pdf bib
CausalQA: A Benchmark for Causal Question Answering
Alexander Bondarenko | Magdalena Wolska | Stefan Heindorf | Lukas Blübaum | Axel-Cyrille Ngonga Ngomo | Benno Stein | Pavel Braslavski | Matthias Hagen | Martin Potthast
Proceedings of the 29th International Conference on Computational Linguistics

At least 5% of questions submitted to search engines ask about cause-effect relationships in some way. To support the development of tailored approaches that can answer such questions, we construct Webis-CausalQA-22, a benchmark corpus of 1.1 million causal questions with answers. We distinguish different types of causal questions using a novel typology derived from a data-driven, manual analysis of questions from ten large question answering (QA) datasets. Using high-precision lexical rules, we extract causal questions of each type from these datasets to create our corpus. As an initial baseline, the state-of-the-art QA model UnifiedQA achieves a ROUGE-L F1 score of 0.48 on our new benchmark.

2017

pdf bib
Unsupervised Text Segmentation Based on Native Language Characteristics
Shervin Malmasi | Mark Dras | Mark Johnson | Lan Du | Magdalena Wolska
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language. We propose a Bayesian unsupervised text segmentation approach to the latter. While baseline models achieve essentially random segmentation on our task, indicating its difficulty, a Bayesian model that incorporates appropriately compact language models and alternating asymmetric priors can achieve scores on the standard metrics around halfway to perfect segmentation.

pdf bib
Simplifying metaphorical language for young readers: A corpus study on news text
Magdalena Wolska | Yulia Clausen
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

The paper presents first results of an ongoing project on text simplification focusing on linguistic metaphors. Based on an analysis of a parallel corpus of news text professionally simplified for different grade levels, we identify six types of simplification choices falling into two broad categories: preserving metaphors or dropping them. An annotation study on almost 300 source sentences with metaphors (grade level 12) and their simplified counterparts (grade 4) is conducted. The results show that most metaphors are preserved and when they are dropped, the semantic content tends to be preserved rather than dropped, however, it is reworded without metaphorical language. In general, some of the expected tendencies in complexity reduction, measured with psycholinguistic variables linked to metaphor comprehension, are observed, suggesting good prospect for machine learning-based metaphor simplification.

2015

pdf bib
Misspellings in Responses to Listening Comprehension Questions: Prospects for Scoring based on Phonetic Normalization
Heike da Silva Cardoso | Magdalena Wolska
Proceedings of the fourth workshop on NLP for computer-assisted language learning

2014

pdf bib
Finding a Tradeoff between Accuracy and Rater’s Workload in Grading Clustered Short Answers
Andrea Horbach | Alexis Palmer | Magdalena Wolska
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

n this paper we investigate the potential of answer clustering for semi-automatic scoring of short answer questions for German as a foreign language. We use surface features like word and character n-grams to cluster answers to listening comprehension exercises per question and simulate having human graders only label one answer per cluster and then propagating this label to all other members of the cluster. We investigate various ways to select this single item to be labeled and find that choosing the item closest to the centroid of a cluster leads to improved (simulated) grading accuracy over random item selection. Averaged over all questions, we can reduce a teacher’s workload to labeling only 40% of all different answers for a question, while still maintaining a grading accuracy of more than 85%.

2012

pdf bib
Extracting glossary sentences from scholarly articles: A comparative evaluation of pattern bootstrapping and deep analysis
Melanie Reiplinger | Ulrich Schäfer | Magdalena Wolska
Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

2008

pdf bib
A Classification of Dialogue Actions in Tutorial Dialogue
Mark Buckley | Magdalena Wolska
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2006

pdf bib
A corpus of tutorial dialogs on theorem proving; the influence of the presentation of the study-material
Christoph Benzmüller | Helmut Horacek | Henri Lesourd | Ivana Kruijff-Korbayova | Marvin Schiller | Magdalena Wolska
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We present a new corpus of tutorial dialogs on mathematical theorem proving that was collected in a Wizard-of-Oz setup. Our study is a follow up on a previous experiment conducted in a similar simulated environment. A major difference between the current and the previous experimental setup was that in this study we varied the presentation of the study-material with which the subjects were provided. One sub-group of the subjects was presented with a highly formalized presentation consisting mainly of formulas, while the other with a presentation mainly in natural language. Our goal was to obtain more data on the kind of mixed-language that is characteristic of informal mathematical discourse. We hypothesized that the language style of the subjects' interaction with the simulated system will reflect the style of presentation of the study-material. In the paper we briefly present the experimental setup, the corpus, and a preliminary quantitative result of the corpus analysis.

pdf bib
Transformation-Based Interpretation of Implicit Parallel Structures: Reconstructing the Meaning of “vice versa” and Similar Linguistic Operators
Helmut Horacek | Magdalena Wolska
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2005

pdf bib
A Hybrid Model for Tutorial Dialogs
Helmut Horacek | Magdalena Wolska
Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue

2004

pdf bib
An Annotated Corpus of Tutorial Dialogs on Mathematical Theorem Proving
Magdalena Wolska | Bao Quoc Vo | Dimitra Tsovaltzi | Ivana Kruijff-Korbayová | Elena Karagjosova | Helmut Horacek | Armin Fiedler | Christoph Benzmüller
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Analysis of Mixed Natural and Symbolic Input in Mathematical Dialogs
Magdalena Wolska | Ivana Kruijff-Korbayová
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Lexical-semantic interpretation of language input in mathematical dialogs
Magdalena Wolska | Ivana Kruijff-Korbayová | Helmut Horacek
Proceedings of the 2nd Workshop on Text Meaning and Interpretation

2003

pdf bib
Toward Evaluation of Writing Style: Overly Repetitious Word Use
Jill Burstein | Magdalena Wolska
10th Conference of the European Chapter of the Association for Computational Linguistics

1998

pdf bib
Description of the UPENN CAMP System as Used for Coreference
Breck Baldwin | Tom Morton | Amit Bagga | Jason Baldridge | Raman Chandraseker | Alexis Dimitriadis | Kieran Snyder | Magdalena Wolska
Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998