2023
pdf
bib
abs
“Orpheus Came to His End by Being Struck by a Thunderbolt”: Annotating Events in Mythological Sequences
Franziska Pannach
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)
The mythological domain has various ways of expressing events and background knowledge. Using data extracted according to the hylistic approach (Zgoll, 2019), we annotated a data set of 6315 sentences from various mythological contexts and geographical origins, like Ancient Greece and Rome or Mesopotamia, into four categories: single-point events (e.g. actions), durative-constant (background knowledge, continuous states), durative-initial, and durative-resultativ. This data is used to train a classifier, which is able to reliably distinguish event types.
pdf
bib
Modeling and Comparison of Narrative Domains with Shallow Ontologies
Franziska Pannach
|
Theresa Blaschke
Proceedings of the 4th Conference on Language, Data and Knowledge
2021
pdf
bib
abs
Employing Wikipedia as a resource for Named Entity Recognition in Morphologically complex under-resourced languages
Aravind Krishnan
|
Stefan Ziehe
|
Franziska Pannach
|
Caroline Sporleder
Proceedings of the 14th Workshop on Building and Using Comparable Corpora (BUCC 2021)
We propose a novel approach for rapid prototyping of named entity recognisers through the development of semi-automatically annotated datasets. We demonstrate the proposed pipeline on two under-resourced agglutinating languages: the Dravidian language Malayalam and the Bantu language isiZulu. Our approach is weakly supervised and bootstraps training data from Wikipedia and Google Knowledge Graph. Moreover, our approach is relatively language independent and can consequently be ported quickly (and hence cost-effectively) from one language to another, requiring only minor language-specific tailoring.
pdf
bib
abs
GCDH@LT-EDI-EACL2021: XLM-RoBERTa for Hope Speech Detection in English, Malayalam, and Tamil
Stefan Ziehe
|
Franziska Pannach
|
Aravind Krishnan
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion
This paper describes approaches to identify Hope Speech in short, informal texts in English, Malayalam and Tamil using different machine learning techniques. We demonstrate that even very simple baseline algorithms perform reasonably well on this task if provided with enough training data. However, our best performing algorithm is a cross-lingual transfer learning approach in which we fine-tune XLM-RoBERTa.
pdf
bib
abs
A Unified Approach to Discourse Relation Classification in nine Languages
Hanna Varachkina
|
Franziska Pannach
Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)
This paper presents efforts to solve the shared task on discourse relation classification (disrpt task 3). The intricate prediction task aims to predict a large number of classes from the Rhetorical Structure Theory (RST) framework for nine target languages. Labels include discourse relations such as background, condition, contrast and elaboration. We present an approach using euclidean distance between sentence embeddings that were extracted using multlingual sentence BERT (sBERT) and directionality as features. The data was combined into five classes which were used for initial prediction. The second classification step predicts the target classes. We observe a substantial difference in results depending on the number of occurrences of the target label in the training data. We achieve the best results on Chinese, where our system achieves 70 % accuracy on 20 labels.
2020
pdf
bib
abs
#GCDH at WNUT-2020 Task 2: BERT-Based Models for the Detection of Informativeness in English COVID-19 Related Tweets
Hanna Varachkina
|
Stefan Ziehe
|
Tillmann Dönicke
|
Franziska Pannach
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
In this system paper, we present a transformer-based approach to the detection of informativeness in English tweets on the topic of the current COVID-19 pandemic. Our models distinguish informative tweets, i.e. tweets containing statistics on recovery, suspected and confirmed cases and COVID-19 related deaths, from uninformative tweets. We present two transformer-based approaches as well as a Naive Bayes classifier and a support vector machine as baseline systems. The transformer models outperform the baselines by more than 0.1 in F1-score, with F1-scores of 0.9091 and 0.9036. Our models were submitted to the shared task Identification of informative COVID-19 English tweets WNUT-2020 Task 2.