Basil Ell

2024

pdf bib abs
Pointing Out the Shortcomings of Relation Extraction Models with Semantically Motivated Adversarials
Gennaro Nolano | Moritz Blum | Basil Ell | Philipp Cimiano
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In recent years, large language models have achieved state-of-the-art performance across various NLP tasks. However, investigations have shown that these models tend to rely on shortcut features, leading to inaccurate predictions and causing the models to be unreliable at generalization to out-of-distribution (OOD) samples. For instance, in the context of relation extraction (RE), we would expect a model to identify the same relation independently of the entities involved in it. For example, consider the sentence “Leonardo da Vinci painted the Mona Lisa” expressing the created(Leonardo_da_Vinci, Mona_Lisa) relation. If we substiute “Leonardo da Vinci” with “Barack Obama”, then the sentence still expresses the created relation. A robust model is supposed to detect the same relation in both cases. In this work, we describe several semantically-motivated strategies to generate adversarial examples by replacing entity mentions and investigate how state-of-the-art RE models perform under pressure. Our analyses show that the performance of these models significantly deteriorates on the modified datasets (avg. of -48.5% in F1), which indicates that these models rely to a great extent on shortcuts, such as surface forms (or patterns therein) of entities, without making full use of the information present in the sentences.

2023

pdf bib abs
Reading between the Lines: Information Extraction from Industry Requirements
Ole Magnus Holter | Basil Ell
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Industry requirements describe the qualities that a project or a service must provide. Most requirements are, however, only available in natural language format and are embedded in textual documents. To be machine-understandable, a requirement needs to be represented in a logical format. We consider that a requirement consists of a scope, which is the requirement’s subject matter, a condition, which is any condition that must be fulfilled for the requirement to be relevant, and a demand, which is what is required. We introduce a novel task, the identification of the semantic components scope, condition, and demand in a requirement sentence, and establish baselines using sequence labelling and few-shot learning. One major challenge with this task is the implicit nature of the scope, often not stated in the sentence. By including document context information, we improved the average performance for scope detection. Our study provides insights into the difficulty of machine understanding of industry requirements and suggests strategies for addressing this challenge.

pdf bib
Human-Machine Collaborative Annotation: A Case Study with GPT-3
Ole Magnus Holter | Basil Ell
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
LexExMachinaQA: A framework for the automatic induction ofontology lexica for Question Answering over Linked Data
Mohammad Fazleh Elahi | Basil Ell | Philipp Cimiano
Proceedings of the 4th Conference on Language, Data and Knowledge