Fritz Kliche


2023

pdf bib
An educational Gamebook on computational linguistic methods for the development of taxonomies
Fritz Kliche | Ulrich Heid | Ralf Knackstedt | Thomas Klupp
Proceedings of the 1st Workshop on Teaching for NLP

2021

pdf bib
Polarity in Translation: Differences between Novice and Experts across Registers
Ekaterina Lapshinova-Koltunski | Fritz Kliche | Anna Moskvina | Johannes Schäfer
Proceedings for the First Workshop on Modelling Translation: Translatology in the Digital Age

pdf bib
Definition Extraction from Mathematical Texts on Graph Theory in German and English
Theresa Kruse | Fritz Kliche
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

2014

pdf bib
The eIdentity Text Exploration Workbench
Fritz Kliche | André Blessing | Ulrich Heid | Jonathan Sonntag
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We work on tools to explore text contents and metadata of newspaper articles as provided by news archives. Our tool components are being integrated into an “Exploration Workbench” for Digital Humanities researchers. Next to the conversion of different data formats and character encodings, a prominent feature of our design is its “Wizard” function for corpus building: Researchers import raw data and define patterns to extract text contents and metadata. The Workbench also comprises different tools for data cleaning. These include filtering of off-topic articles, duplicates and near-duplicates, corrupted and empty articles. We currently work on ca. 860.000 newspaper articles from different media archives, provided in different data formats. We index the data with state-of-the-art systems to allow for large scale information retrieval. We extract metadata on publishing dates, author names, newspaper sections, etc., and split articles into segments such as headlines, subtitles, paragraphs, etc. After cleaning the data and compiling a thematically homogeneous corpus, the sample can be used for quantitative analyses which are not affected by noise. Users can retrieve sets of articles on different topics, issues or otherwise defined research questions (“subcorpora”) and investigate quantitatively their media attention on the timeline (“Issue Cycles”).

2013

pdf bib
Towards a Tool for Interactive Concept Building for Large Scale Analysis in the Humanities
Andre Blessing | Jonathan Sonntag | Fritz Kliche | Ulrich Heid | Jonas Kuhn | Manfred Stede
Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities