Simon Ostermann


2024

pdf bib
CoXQL: A Dataset for Parsing Explanation Requests in Conversational XAI Systems
Qianli Wang | Tatiana Anikina | Nils Feldhus | Simon Ostermann | Sebastian Möller
Findings of the Association for Computational Linguistics: EMNLP 2024

Conversational explainable artificial intelligence (ConvXAI) systems based on large language models (LLMs) have garnered significant interest from the research community in natural language processing (NLP) and human-computer interaction (HCI). Such systems can provide answers to user questions about explanations in dialogues, have the potential to enhance users’ comprehension and offer more information about the decision-making and generation processes of LLMs. Currently available ConvXAI systems are based on intent recognition rather than free chat, as this has been found to be more precise and reliable in identifying users’ intentions. However, the recognition of intents still presents a challenge in the case of ConvXAI, since little training data exist and the domain is highly specific, as there is a broad range of XAI methods to map requests onto. In order to bridge this gap, we present CoXQL, the first dataset in the NLP domain for user intent recognition in ConvXAI, covering 31 intents, seven of which require filling multiple slots. Subsequently, we enhance an existing parsing approach by incorporating template validations, and conduct an evaluation of several LLMs on CoXQL using different parsing strategies. We conclude that the improved parsing approach (MP+) surpasses the performance of previous approaches. We also discover that intents with multiple slots remain highly challenging for LLMs.

pdf bib
MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos
Cennet Oguz | Pascal Denis | Simon Ostermann | Emmanuel Vincent | Natalia Skachkova | Josef Van Genabith
Findings of the Association for Computational Linguistics: EMNLP 2024

Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, training data, and test data are in different languages. As knowledge needs to be transferred across languages, this task is challenging, both in the multilingual and cross-lingual setting. We hypothesize that one way to alleviate some of the difficulty of the task is to include multimodal information in the form of images (i.e. frames extracted from instructional videos). Such visual inputs are by nature language agnostic, therefore cross- and multilingual anaphora resolution should benefit from visual information. In this paper, we provide the first multilingual and multimodal dataset annotated with anaphoric relations and present experimental results for end-to-end multimodal and multilingual anaphora resolution. Given gold mentions, multimodal features improve anaphora resolution results by ~10 % for unseen languages.

pdf bib
Adapting Multilingual LLMs to Low-Resource Languages with Knowledge Graphs via Adapters
Daniil Gurgurov | Mareike Hartmann | Simon Ostermann
Proceedings of the 1st Workshop on Knowledge Graphs and Large Language Models (KaLLM 2024)

This paper explores the integration of graph knowledge from linguistic ontologies into multilingual Large Language Models (LLMs) using adapters to improve performance for low-resource languages (LRLs) in sentiment analysis (SA) and named entity recognition (NER). Building upon successful parameter-efficient fine-tuning techniques, such as K-ADAPTER and MAD-X, we propose a similar approach for incorporating knowledge from multilingual graphs, connecting concepts in various languages with each other through linguistic relationships, into multilingual LLMs for LRLs. Specifically, we focus on eight LRLs — Maltese, Bulgarian, Indonesian, Nepali, Javanese, Uyghur, Tibetan, and Sinhala — and employ language-specific adapters fine-tuned on data extracted from the language-specific section of ConceptNet, aiming to enable knowledge transfer across the languages covered by the knowledge graph. We compare various fine-tuning objectives, including standard Masked Language Modeling (MLM), MLM with full-word masking, and MLM with targeted masking, to analyze their effectiveness in learning and integrating the extracted graph data. Through empirical evaluation on language-specific tasks, we assess how structured graph knowledge affects the performance of multilingual LLMs for LRLs in SA and NER, providing insights into the potential benefits of adapting language models for low-resource scenarios.

pdf bib
HybridBERT - Making BERT Pretraining More Efficient Through Hybrid Mixture of Attention Mechanisms
Gokul Srinivasagan | Simon Ostermann
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

Pretrained transformer-based language models have produced state-of-the-art performance in most natural language understanding tasks. These models undergo two stages of training: pretraining on a huge corpus of data and fine-tuning on a specific downstream task. The pretraining phase is extremely compute-intensive and requires several high-performance computing devices like GPUs and several days or even months of training, but it is crucial for the model to capture global knowledge and also has a significant impact on the fine-tuning task. This is a major roadblock for researchers without access to sophisticated computing resources. To overcome this challenge, we propose two novel hybrid architectures called HybridBERT (HBERT), which combine self-attention and additive attention mechanisms together with sub-layer normalization. We introduce a computing budget to the pretraining phase, limiting the training time and usage to a single GPU. We show that HBERT attains twice the pretraining accuracy of a vanilla-BERT baseline. We also evaluate our proposed models on two downstream tasks, where we outperform BERT-base while accelerating inference. Moreover, we study the effect of weight initialization with a limited pretraining budget. The code and models are publicly available at: www.github.com/gokulsg/HBERT/.

pdf bib
UoM-DFKI submission to the low resource shared task
Kumar Rishu | Aiden Williams | Claudia Borg | Simon Ostermann
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This system description paper presents the details of our primary and contrastive approaches to translating Maltese into English for IWSLT 24. The Maltese language shares a large vocabulary with Arabic and Italian languages, thus making it an ideal candidate to test the cross-lingual capabilities of recent state-of-the-art models. We experiment with two end-to-end approaches for our submissions: the Whisper and wav2vec 2.0 models. Our primary system gets a BLEU score of 35.1 on the combined data, whereas our contrastive approach gets 18.5. We also provide a manual analysis of our contrastive approach to identify some pitfalls that may have caused this difference.

pdf bib
A Comparison of Different Tokenization Methods for the Georgian Language
Beso Mikaberidze | Temo Saghinadze | Guram Mikaberidze | Raphael Kalandadze | Konstantine Pkhakadze | Josef van Genabith | Simon Ostermann | Lonneke van der Plas | Philipp Müller
Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024)

pdf bib
DFKI-MLST at DialAM-2024 Shared Task: System Description
Arne Binder | Tatiana Anikina | Leonhard Hennig | Simon Ostermann
Proceedings of the 11th Workshop on Argument Mining (ArgMining 2024)

This paper presents the dfki-mlst submission for the DialAM shared task (Ruiz-Dolz et al., 2024) on identification of argumentative and illocutionary relations in dialogue. Our model achieves best results in the global setting: 48.25 F1 at the focused level when looking only at the related arguments/locutions and 67.05 F1 at the general level when evaluating the complete argument maps. We describe our implementation of the data pre-processing, relation encoding and classification, evaluating 11 different base models and performing experiments with, e.g., node text combination and data augmentation. Our source code is publicly available.

pdf bib
Common European Language Data Space
Georg Rehm | Stelios Piperidis | Khalid Choukri | Andrejs Vasiļjevs | Katrin Marheinecke | Victoria Arranz | Aivars Bērziņš | Miltos Deligiannis | Dimitris Galanis | Maria Giagkou | Katerina Gkirtzou | Dimitris Gkoumas | Annika Grützner-Zahn | Athanasia Kolovou | Penny Labropoulou | Andis Lagzdiņš | Elena Leitner | Valérie Mapelli | Hélène Mazo | Simon Ostermann | Stefania Racioppa | Mickaël Rigault | Leon Voukoutis
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The Common European Language Data Space (LDS) is an integral part of the EU data strategy, which aims at developing a single market for data. Its decentralised technical infrastructure and governance scheme are currently being developed by the LDS project, which also has dedicated tasks for proof-of-concept prototypes, handling legal aspects, raising awareness and promoting the LDS through events and social media channels. The LDS is part of a broader vision for establishing all necessary components to develop European large language models.

2023

pdf bib
Find-2-Find: Multitask Learning for Anaphora Resolution and Object Localization
Cennet Oguz | Pascal Denis | Emmanuel Vincent | Simon Ostermann | Josef van Genabith
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

In multimodal understanding tasks, visual and linguistic ambiguities can arise. Visual ambiguity can occur when visual objects require a model to ground a referring expression in a video without strong supervision, while linguistic ambiguity can occur from changes in entities in action flows. As an example from the cooking domain, “oil” mixed with “salt” and “pepper” could later be referred to as a “mixture”. Without a clear visual-linguistic alignment, we cannot know which among several objects shown is referred to by the language expression “mixture”, and without resolved antecedents, we cannot pinpoint what the mixture is. We define this chicken-and-egg problem as Visual-linguistic Ambiguity. In this paper, we present Find2Find, a joint anaphora resolution and object localization dataset targeting the problem of visual-linguistic ambiguity, consisting of 500 anaphora-annotated recipes with corresponding videos. We present experimental results of a novel end-to-end joint multitask learning framework for Find2Find that fuses visual and textual information and shows improvements both for anaphora resolution and object localization with one joint model in multitask learning, as compared to a strong single-task baseline.

pdf bib
Investigating the Encoding of Words in BERT’s Neurons Using Feature Textualization
Tanja Baeumel | Soniya Vijayakumar | Josef van Genabith | Guenter Neumann | Simon Ostermann
Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Pretrained language models (PLMs) form the basis of most state-of-the-art NLP technologies. Nevertheless, they are essentially black boxes: Humans do not have a clear understanding of what knowledge is encoded in different parts of the models, especially in individual neurons. A contrast is in computer vision, where feature visualization provides a decompositional interpretability technique for neurons of vision models. Activation maximization is used to synthesize inherently interpretable visual representations of the information encoded in individual neurons. Our work is inspired by this but presents a cautionary tale on the interpretability of single neurons, based on the first large-scale attempt to adapt activation maximization to NLP, and, more specifically, large PLMs. We propose feature textualization, a technique to produce dense representations of neurons in the PLM word embedding space. We apply feature textualization to the BERT model to investigate whether the knowledge encoded in individual neurons can be interpreted and symbolized. We find that the produced representations can provide insights about the knowledge encoded in individual neurons, but that individual neurons do not represent clear-cut symbolic units of language such as words. Additionally, we use feature textualization to investigate how many neurons are needed to encode words in BERT.

2019

pdf bib
MCScript2.0: A Machine Comprehension Corpus Focused on Script Events and Participants
Simon Ostermann | Michael Roth | Manfred Pinkal
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

We introduce MCScript2.0, a machine comprehension corpus for the end-to-end evaluation of script knowledge. MCScript2.0 contains approx. 20,000 questions on approx. 3,500 texts, crowdsourced based on a new collection process that results in challenging questions. Half of the questions cannot be answered from the reading texts, but require the use of commonsense and, in particular, script knowledge. We give a thorough analysis of our corpus and show that while the task is not challenging to humans, existing machine comprehension models fail to perform well on the data, even if they make use of a commonsense knowledge base. The dataset is available at http://www.sfb1102.uni-saarland.de/?page_id=2582

pdf bib
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing
Simon Ostermann | Sheng Zhang | Michael Roth | Peter Clark
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

pdf bib
Commonsense Inference in Natural Language Processing (COIN) - Shared Task Report
Simon Ostermann | Sheng Zhang | Michael Roth | Peter Clark
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

This paper reports on the results of the shared tasks of the COIN workshop at EMNLP-IJCNLP 2019. The tasks consisted of two machine comprehension evaluations, each of which tested a system’s ability to answer questions/queries about a text. Both evaluations were designed such that systems need to exploit commonsense knowledge, for example, in the form of inferences over information that is available in the common ground but not necessarily mentioned in the text. A total of five participating teams submitted systems for the shared tasks, with the best submitted system achieving 90.6% accuracy and 83.7% F1-score on task 1 and task 2, respectively.

2018

pdf bib
Mapping Texts to Scripts: An Entailment Study
Simon Ostermann | Hannah Seitz | Stefan Thater | Manfred Pinkal
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge
Simon Ostermann | Ashutosh Modi | Michael Roth | Stefan Thater | Manfred Pinkal
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge
Simon Ostermann | Michael Roth | Ashutosh Modi | Stefan Thater | Manfred Pinkal
Proceedings of the 12th International Workshop on Semantic Evaluation

This report summarizes the results of the SemEval 2018 task on machine comprehension using commonsense knowledge. For this machine comprehension task, we created a new corpus, MCScript. It contains a high number of questions that require commonsense knowledge for finding the correct answer. 11 teams from 4 different countries participated in this shared task, most of them used neural approaches. The best performing system achieves an accuracy of 83.95%, outperforming the baselines by a large margin, but still far from the human upper bound, which was found to be at 98%.

2017

pdf bib
Aligning Script Events with Narrative Texts
Simon Ostermann | Michael Roth | Stefan Thater | Manfred Pinkal
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Script knowledge plays a central role in text understanding and is relevant for a variety of downstream tasks. In this paper, we consider two recent datasets which provide a rich and general representation of script events in terms of paraphrase sets. We introduce the task of mapping event mentions in narrative texts to such script event types, and present a model for this task that exploits rich linguistic representations as well as information on temporal ordering. The results of our experiments demonstrate that this complex task is indeed feasible.

2016

pdf bib
InScript: Narrative texts annotated with script information
Ashutosh Modi | Tatjana Anikina | Simon Ostermann | Manfred Pinkal
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents the InScript corpus (Narrative Texts Instantiating Script structure). InScript is a corpus of 1,000 stories centered around 10 different scenarios. Verbs and noun phrases are annotated with event and participant types, respectively. Additionally, the text is annotated with coreference information. The corpus shows rich lexical variation and will serve as a unique resource for the study of the role of script knowledge in natural language processing.

2015

pdf bib
Annotating Entailment Relations for Shortanswer Questions
Simon Ostermann | Andrea Horbach | Manfred Pinkal
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

2014

pdf bib
Paraphrase Detection for Short Answer Scoring
Nikolina Koleva | Andrea Horbach | Alexis Palmer | Simon Ostermann | Manfred Pinkal
Proceedings of the third workshop on NLP for computer-assisted language learning