Michael Gertz


pdf bib
Evaluating Factual Consistency of Texts with Semantic Role Labeling
Jing Fan | Dennis Aumiller | Michael Gertz
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

Automated evaluation of text generation systems has recently seen increasing attention, particularly checking whether generated text stays truthful to input sources. Existing methods frequently rely on an evaluation using task-specific language models, which in turn allows for little interpretability of generated scores. We introduce SRLScore, a reference-free evaluation metric designed with text summarization in mind. Our approach generates fact tuples constructed from Semantic Role Labels, applied to both input and summary texts.A final factuality score is computed by an adjustable scoring mechanism, which allows for easy adaption of the method across domains. Correlation with human judgments on English summarization datasets shows that SRLScore is competitive with state-of-the-art methods and exhibits stable generalization across datasets without requiring further training or hyperparameter tuning. We experiment with an optional co-reference resolution step, but find that the performance boost is mostly outweighed by the additional compute required. Our metric is available online at: https://github.com/heyjing/SRLScore

pdf bib
CQE: A Comprehensive Quantity Extractor
Satya Almasian | Vivian Kazakova | Philipp Göldner | Michael Gertz
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Quantities are essential in documents to describe factual information. They are ubiquitous in application domains such as finance, business, medicine, and science in general. Compared to other information extraction approaches, interestingly only a few works exist that describe methods for a proper extraction and representation of quantities in text. In this paper, we present such a comprehensive quantity extraction framework from text data. It efficiently detects combinations of values and units, the behavior of a quantity (e.g., rising or falling), and the concept a quantity is associated with. Our framework makes use of dependency parsing and a dictionary of units, and it provides for a proper normalization and standardization of detected quantities. Using a novel dataset for evaluation, we show that our open source framework outperforms other systems and – to the best of our knowledge – is the first to detect concepts associated with identified quantities. The code and data underlying our framework are available at https://github.com/vivkaz/CQE.

pdf bib
WeLT: Improving Biomedical Fine-tuned Pre-trained Language Models with Cost-sensitive Learning
Ghadeer Mobasher | Wolfgang Müller | Olga Krebs | Michael Gertz
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

Fine-tuning biomedical pre-trained language models (BioPLMs) such as BioBERT has become a common practice dominating leaderboards across various natural language processing tasks. Despite their success and wide adoption, prevailing fine-tuning approaches for named entity recognition (NER) naively train BioPLMs on targeted datasets without considering class distributions. This is problematic especially when dealing with imbalanced biomedical gold-standard datasets for NER in which most biomedical entities are underrepresented. In this paper, we address the class imbalance problem and propose WeLT, a cost-sensitive fine-tuning approach based on new re-scaled class weights for the task of biomedical NER. We evaluate WeLT’s fine-tuning performance on mixed-domain and domain-specific BioPLMs using eight biomedical gold-standard datasets. We compare our approach against vanilla fine-tuning and three other existing re-weighting schemes. Our results show the positive impact of handling the class imbalance problem. WeLT outperforms all the vanilla fine-tuned models. Furthermore, our method demonstrates advantages over other existing weighting schemes in most experiments.


pdf bib
UniHD at TSAR-2022 Shared Task: Is Compute All We Need for Lexical Simplification?
Dennis Aumiller | Michael Gertz
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

Previous state-of-the-art models for lexical simplification consist of complex pipelines with several components, each of which requires deep technical knowledge and fine-tuned interaction to achieve its full potential. As an alternative, we describe a frustratingly simple pipeline based on prompted GPT-3 responses, beating competing approaches by a wide margin in settings with few training instances. Our best-performing submission to the English language track of the TSAR-2022 shared task consists of an “ensemble” of six different prompt templates with varying context levels. As a late-breaking result, we further detail a language transfer technique that allows simplification in languages other than English. Applied to the Spanish and Portuguese subset, we achieve state-of-the-art results with only minor modification to the original prompts. Aside from detailing the implementation and setup, we spend the remainder of this work discussing the particularities of prompting and implications for future work. Code for the experiments is available online at https://github.com/dennlinger/TSAR-2022-Shared-Task.

pdf bib
EUR-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain
Dennis Aumiller | Ashish Chouhan | Michael Gertz
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Existing summarization datasets come with two main drawbacks: (1) They tend to focus on overly exposed domains, such as news articles or wiki-like texts, and (2) are primarily monolingual, with few multilingual datasets. In this work, we propose a novel dataset, called EUR-Lex-Sum, based on manually curated document summaries of legal acts from the European Union law platform (EUR-Lex). Documents and their respective summaries exist as cross-lingual paragraph-aligned data in several of the 24 official European languages, enabling access to various cross-lingual and lower-resourced summarization setups. We obtain up to 1,500 document/summary pairs per language, including a subset of 375 cross-lingually aligned legal acts with texts available in *all* 24 languages. In this work, the data acquisition process is detailed and key characteristics of the resource are compared to existing summarization resources. In particular, we illustrate challenging sub-problems and open questions on the dataset that could help the facilitation of future research in the direction of domain-specific cross-lingual summarization. Limited by the extreme length and language diversity of samples, we further conduct experiments with suitable extractive monolingual and cross-lingual baselines for future work. Code for the extraction as well as access to our data and baselines is available online at: [https://github.com/achouhan93/eur-lex-sum](https://github.com/achouhan93/eur-lex-sum).

pdf bib
Three Real-World Datasets and Neural Computational Models for Classification Tasks in Patent Landscaping
Subhash Pujari | Jannik Strötgen | Mark Giereth | Michael Gertz | Annemarie Friedrich
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Patent Landscaping, one of the central tasks of intellectual property management, includes selecting and grouping patents according to user-defined technical or application-oriented criteria. While recent transformer-based models have been shown to be effective for classifying patents into taxonomies such as CPC or IPC, there is yet little research on how to support real-world Patent Landscape Studies (PLSs) using natural language processing methods. With this paper, we release three labeled datasets for PLS-oriented classification tasks covering two diverse domains. We provide a qualitative analysis and report detailed corpus statistics. Most research on neural models for patents has been restricted to leveraging titles and abstracts. We compare strong neural and non-neural baselines, proposing a novel model that takes into account textual information from the patents’ full texts as well as embeddings created based on the patents’ CPC labels. We find that for PLS-oriented classification tasks, going beyond title and abstract is crucial, CPC labels are an effective source of information, and combining all features yields the best results.

pdf bib
Klexikon: A German Dataset for Joint Summarization and Simplification
Dennis Aumiller | Michael Gertz
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Traditionally, Text Simplification is treated as a monolingual translation task where sentences between source texts and their simplified counterparts are aligned for training. However, especially for longer input documents, summarizing the text (or dropping less relevant content altogether) plays an important role in the simplification process, which is currently not reflected in existing datasets. Simultaneously, resources for non-English languages are scarce in general and prohibitive for training new solutions. To tackle this problem, we pose core requirements for a system that can jointly summarize and simplify long source documents. We further describe the creation of a new dataset for joint Text Simplification and Summarization based on German Wikipedia and the German children’s encyclopedia “Klexikon”, consisting of almost 2,900 documents. We release a document-aligned version that particularly highlights the summarization aspect, and provide statistical evidence that this resource is well suited to simplification as well. Code and data are available on Github: https://github.com/dennlinger/klexikon


pdf bib
UniHD@CL-SciSumm 2020: Citation Extraction as Search
Dennis Aumiller | Satya Almasian | Philip Hausner | Michael Gertz
Proceedings of the First Workshop on Scholarly Document Processing

This work presents the entry by the team from Heidelberg University in the CL-SciSumm 2020 shared task at the Scholarly Document Processing workshop at EMNLP 2020. As in its previous iterations, the task is to highlight relevant parts in a reference paper, depending on a citance text excerpt from a citing paper. We participated in tasks 1A (citation identification) and 1B (citation context classification). Contrary to most previous works, we frame Task 1A as a search relevance problem, and introduce a 2-step re-ranking approach, which consists of a preselection based on BM25 in addition to positional document features, and a top-k re-ranking with BERT. For Task 1B, we follow previous submissions in applying methods that deal well with low resources and imbalanced classes.


pdf bib
HeidelPlace: An Extensible Framework for Geoparsing
Ludwig Richter | Johanna Geiß | Andreas Spitz | Michael Gertz
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Geographic information extraction from textual data sources, called geoparsing, is a key task in text processing and central to subsequent spatial analysis approaches. Several geoparsers are available that support this task, each with its own (often limited or specialized) gazetteer and its own approaches to toponym detection and resolution. In this demonstration paper, we present HeidelPlace, an extensible framework in support of geoparsing. Key features of HeidelPlace include a generic gazetteer model that supports the integration of place information from different knowledge bases, and a pipeline approach that enables an effective combination of diverse modules tailored to specific geoparsing tasks. This makes HeidelPlace a valuable tool for testing and evaluating different gazetteer sources and geoparsing methods. In the demonstration, we show how to set up a geoparsing workflow with HeidelPlace and how it can be used to compare and consolidate the output of different geoparsing approaches.


pdf bib
A Baseline Temporal Tagger for all Languages
Jannik Strötgen | Michael Gertz
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
HeidelToul: A Baseline Approach for Cross-document Event Ordering
Bilel Moulahi | Jannik Strötgen | Michael Gertz | Lynda Tamine
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)


pdf bib
Chinese Temporal Tagging with HeidelTime
Hui Li | Jannik Strötgen | Julian Zell | Michael Gertz
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Computational Narratology: Extracting Tense Clusters from Narrative Texts
Thomas Bögel | Jannik Strötgen | Michael Gertz
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Computational Narratology is an emerging field within the Digital Humanities. In this paper, we tackle the problem of extracting temporal information as a basis for event extraction and ordering, as well as further investigations of complex phenomena in narrative texts. While most existing systems focus on news texts and extract explicit temporal information exclusively, we show that this approach is not feasible for narratives. Based on tense information of verbs, we define temporal clusters as an annotation task and validate the annotation schema by showing that the task can be performed with high inter-annotator agreement. To alleviate and reduce the manual annotation effort, we propose a rule-based approach to robustly extract temporal clusters using a multi-layered and dynamic NLP pipeline that combines off-the-shelf components in a heuristic setting. Comparing our results against human judgements, our system is capable of predicting the tense of verbs and sentences with very high reliability: for the most prevalent tense in our corpus, more than 95% of all verbs are annotated correctly.

pdf bib
Extending HeidelTime for Temporal Expressions Referring to Historic Dates
Jannik Strötgen | Thomas Bögel | Julian Zell | Ayser Armiti | Tran Van Canh | Michael Gertz
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Research on temporal tagging has achieved a lot of attention during the last years. However, most of the work focuses on processing news-style documents. Thus, references to historic dates are often not well handled by temporal taggers although they frequently occur in narrative-style documents about history, e.g., in many Wikipedia articles. In this paper, we present the AncientTimes corpus containing documents about different historic time periods in eight languages, in which we manually annotated temporal expressions. Based on this corpus, we explain the challenges of temporal tagging documents about history. Furthermore, we use the corpus to extend our multilingual, cross-domain temporal tagger HeidelTime to extract and normalize temporal expressions referring to historic dates, and to demonstrate HeidelTime’s new capabilities. Both, the AncientTimes corpus as well as the new HeidelTime version are made publicly available.


pdf bib
HeidelTime: Tuning English and Developing Spanish Resources for TempEval-3
Jannik Strötgen | Julian Zell | Michael Gertz
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)


pdf bib
Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards
Jannik Strötgen | Michael Gertz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In the last years, temporal tagging has received increasing attention in the area of natural language processing. However, most of the research so far concentrated on processing news documents. Only recently, two temporal annotated corpora of narrative-style documents were developed, and it was shown that a domain shift results in significant challenges for temporal tagging. Thus, a temporal tagger should be aware of the domain associated with documents that are to be processed and apply domain-specific strategies for extracting and normalizing temporal expressions. In this paper, we analyze the characteristics of temporal expressions in different domains. In addition to news- and narrative-style documents, we add two further document types, namely colloquial and scientific documents. After discussing the challenges of temporal tagging on the different domains, we describe some strategies to tackle these challenges and describe their integration into our publicly available temporal tagger HeidelTime. Our cross-domain evaluation validates the benefits of domain-sensitive temporal tagging. Furthermore, we make available two new temporally annotated corpora and a new version of HeidelTime, which now distinguishes between four document domain types.


pdf bib
HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions
Jannik Strötgen | Michael Gertz
Proceedings of the 5th International Workshop on Semantic Evaluation