2014
pdf
bib
abs
The eIdentity Text Exploration Workbench
Fritz Kliche
|
André Blessing
|
Ulrich Heid
|
Jonathan Sonntag
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We work on tools to explore text contents and metadata of newspaper articles as provided by news archives. Our tool components are being integrated into an “Exploration Workbench” for Digital Humanities researchers. Next to the conversion of different data formats and character encodings, a prominent feature of our design is its “Wizard” function for corpus building: Researchers import raw data and define patterns to extract text contents and metadata. The Workbench also comprises different tools for data cleaning. These include filtering of off-topic articles, duplicates and near-duplicates, corrupted and empty articles. We currently work on ca. 860.000 newspaper articles from different media archives, provided in different data formats. We index the data with state-of-the-art systems to allow for large scale information retrieval. We extract metadata on publishing dates, author names, newspaper sections, etc., and split articles into segments such as headlines, subtitles, paragraphs, etc. After cleaning the data and compiling a thematically homogeneous corpus, the sample can be used for quantitative analyses which are not affected by noise. Users can retrieve sets of articles on different topics, issues or otherwise defined research questions (“subcorpora”) and investigate quantitatively their media attention on the timeline (“Issue Cycles”).
pdf
bib
abs
GraPAT: a Tool for Graph Annotations
Jonathan Sonntag
|
Manfred Stede
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We introduce GraPAT, a web-based annotation tool for building graph structures over text. Graphs have been demonstrated to be relevant in a variety of quite diverse annotation efforts and in different NLP applications, and they serve to model annotators intuitions quite closely. In particular, in this paper we discuss the implementation of graph annotations for sentiment analysis, argumentation structure, and rhetorical text structures. All of these scenarios can create certain problems for existing annotation tools, and we show how GraPAT can help to overcome such difficulties.
pdf
bib
Conceptual and Practical Steps in Event Coreference Analysis of Large-scale Data
Fatemeh Torabi Asr
|
Jonathan Sonntag
|
Yulia Grishina
|
Manfred Stede
Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation
2013
pdf
bib
From newspaper to microblogging: What does it take to find opinions?
Wladimir Sidorenko
|
Jonathan Sonntag
|
Nina Krüger
|
Stefan Stieglitz
|
Manfred Stede
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
pdf
bib
Towards a Tool for Interactive Concept Building for Large Scale Analysis in the Humanities
Andre Blessing
|
Jonathan Sonntag
|
Fritz Kliche
|
Ulrich Heid
|
Jonas Kuhn
|
Manfred Stede
Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities