2024
pdf
bib
FICODE at GermEval 2024 GerMS-Detect closed ST1 & ST2: Ensemble- and Transformer-Based Detection of Sexism and Misogyny in German Texts
Maoro Falk
|
Michaela Geierhos
Proceedings of GermEval 2024 Task 1 GerMS-Detect Workshop on Sexism Detection in German Online News Fora (GerMS-Detect 2024)
pdf
bib
abs
Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples
Philipp J. Rösch
|
Norbert Oswald
|
Michaela Geierhos
|
Jindřich Libovický
Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR)
Current vision-language models leveraging contrastive learning often face limitations in developing fine-grained conceptual understanding. This is due to random negative samples during pretraining, causing almost exclusively very dissimilar concepts to be compared in the loss function. Consequently, the models struggle with fine-grained semantic differences. To address this problem, we introduce a novel pretraining method incorporating synthetic hard negative text examples. The hard negatives replace terms corresponding to visual concepts, leading to a more fine-grained visual and textual concept alignment. Further, we introduce InpaintCOCO, a new challenging dataset for assessing the fine-grained alignment of colors, objects, and sizes in vision-language models. We created the dataset using generative inpainting from COCO images by changing the visual concepts so that the images no longer match their original captions. Our results show significant improvements in fine-grained concept understanding across various vision-language datasets, including our InpaintCOCO dataset.
pdf
bib
abs
ITER: Iterative Transformer-based Entity Recognition and Relation Extraction
Moritz Hennen
|
Florian Babl
|
Michaela Geierhos
Findings of the Association for Computational Linguistics: EMNLP 2024
When extracting structured information from text, recognizing entities and extracting relationships are essential. Recent advances in both tasks generate a structured representation of the information in an autoregressive manner, a time-consuming and computationally expensive approach. This naturally raises the question of whether autoregressive methods are necessary in order to achieve comparable results. In this work, we propose ITER, an efficient encoder-based relation extraction model, that performs the task in three parallelizable steps, greatly accelerating a recent language modeling approach: ITER achieves an inference throughput of over 600 samples per second for a large model on a single consumer-grade GPU. Furthermore, we achieve state-of-the-art results on the relation extraction datasets ADE and ACE05, and demonstrate competitive performance for both named entity recognition with GENIA and CoNLL03, and for relation extraction with SciERC and CoNLL04.
pdf
bib
abs
Curation of Benchmark Templates for Measuring Gender Bias in Named Entity Recognition Models
Ana Cimitan
|
Ana Alves Pinto
|
Michaela Geierhos
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Named Entity Recognition (NER) constitutes a popular machine learning technique that empowers several natural language processing applications. As with other machine learning applications, NER models have been shown to be susceptible to gender bias. The latter is often assessed using benchmark datasets, which in turn are curated specifically for a given Natural Language Processing (NLP) task. In this work, we investigate the robustness of benchmark templates to detect gender bias and propose a novel method to improve the curation of such datasets. The method, based on masked token prediction, aims to filter out benchmark templates with a higher probability of detecting gender bias in NER models. We tested the method for English and German, using the corresponding fine-tuned BERT base model (cased) as the NER model. The gender gaps detected with templates classified as appropriate by the method were statistically larger than those detected with inappropriate templates. The results were similar for both languages and support the use of the proposed method in the curation of templates designed to detect gender bias.
2021
pdf
bib
Using Bloom’s Taxonomy to Classify Question Complexity
Sabine Ullrich
|
Michaela Geierhos
Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021)
2017
pdf
bib
abs
Annotation Challenges for Reconstructing the Structural Elaboration of Middle Low German
Nina Seemann
|
Marie-Luis Merten
|
Michaela Geierhos
|
Doris Tophinke
|
Eyke Hüllermeier
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
In this paper, we present the annotation challenges we have encountered when working on a historical language that was undergoing elaboration processes. We especially focus on syntactic ambiguity and gradience in Middle Low German, which causes uncertainty to some extent. Since current annotation tools consider construction contexts and the dynamics of the grammaticalization only partially, we plan to extend CorA - a web-based annotation tool for historical and other non-standard language data - to capture elaboration phenomena and annotator unsureness. Moreover, we seek to interactively learn morphological as well as syntactic annotations.
2016
pdf
bib
On- and Off-Topic Classification and Semantic Annotation of User-Generated Software Requirements
Markus Dollmann
|
Michaela Geierhos
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
2008
pdf
bib
RELAX — Extraction de relations sémantiques dans les contextes biographiques [RELAX — Extractino of Semantic Relations in Biographical Contexts]
Michaela Geierhos
|
Olivier Blanc
|
Sandra Bsiri
Traitement Automatique des Langues, Volume 49, Numéro 1 : Varia [Varia]