Thomas Bögel


2014

pdf bib
Computational Narratology: Extracting Tense Clusters from Narrative Texts
Thomas Bögel | Jannik Strötgen | Michael Gertz
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Computational Narratology is an emerging field within the Digital Humanities. In this paper, we tackle the problem of extracting temporal information as a basis for event extraction and ordering, as well as further investigations of complex phenomena in narrative texts. While most existing systems focus on news texts and extract explicit temporal information exclusively, we show that this approach is not feasible for narratives. Based on tense information of verbs, we define temporal clusters as an annotation task and validate the annotation schema by showing that the task can be performed with high inter-annotator agreement. To alleviate and reduce the manual annotation effort, we propose a rule-based approach to robustly extract temporal clusters using a multi-layered and dynamic NLP pipeline that combines off-the-shelf components in a heuristic setting. Comparing our results against human judgements, our system is capable of predicting the tense of verbs and sentences with very high reliability: for the most prevalent tense in our corpus, more than 95% of all verbs are annotated correctly.

pdf bib
Extending HeidelTime for Temporal Expressions Referring to Historic Dates
Jannik Strötgen | Thomas Bögel | Julian Zell | Ayser Armiti | Tran Van Canh | Michael Gertz
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Research on temporal tagging has achieved a lot of attention during the last years. However, most of the work focuses on processing news-style documents. Thus, references to historic dates are often not well handled by temporal taggers although they frequently occur in narrative-style documents about history, e.g., in many Wikipedia articles. In this paper, we present the AncientTimes corpus containing documents about different historic time periods in eight languages, in which we manually annotated temporal expressions. Based on this corpus, we explain the challenges of temporal tagging documents about history. Furthermore, we use the corpus to extend our multilingual, cross-domain temporal tagger HeidelTime to extract and normalize temporal expressions referring to historic dates, and to demonstrate HeidelTime’s new capabilities. Both, the AncientTimes corpus as well as the new HeidelTime version are made publicly available.