Sebastian Arnold


2019

pdf bib
SECTOR: A Neural Model for Coherent Topic Segmentation and Classification
Sebastian Arnold | Rudolf Schneider | Philippe Cudré-Mauroux | Felix A. Gers | Alexander Löser
Transactions of the Association for Computational Linguistics, Volume 7

When searching for information, a human reader first glances over a document, spots relevant sections, and then focuses on a few sentences for resolving her intention. However, the high variance of document structure complicates the identification of the salient topic of a given section at a glance. To tackle this challenge, we present SECTOR, a model to support machine reading systems by segmenting documents into coherent sections and assigning topic labels to each section. Our deep neural network architecture learns a latent topic embedding over the course of a document. This can be leveraged to classify local topics from plain text and segment a document at topic shifts. In addition, we contribute WikiSection, a publicly available data set with 242k labeled sections in English and German from two distinct domains: diseases and cities. From our extensive evaluation of 20 architectures, we report a highest score of 71.6% F1 for the segmentation and classification of 30 topics from the English city domain, scored by our SECTOR long short-term memory model with Bloom filter embeddings and bidirectional segmentation. This is a significant improvement of 29.5 points F1 over state-of-the-art CNN classifiers with baseline segmentation.

2016

pdf bib
TASTY: Interactive Entity Linking As-You-Type
Sebastian Arnold | Robert Dziuba | Alexander Löser
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

We introduce TASTY (Tag-as-you-type), a novel text editor for interactive entity linking as part of the writing process. Tasty supports the author of a text with complementary information about the mentioned entities shown in a ‘live’ exploration view. The system is automatically triggered by keystrokes, recognizes mention boundaries and disambiguates the mentioned entities to Wikipedia articles. The author can use seven operators to interact with the editor and refine the results according to his specific intention while writing. Our implementation captures syntactic and semantic context using a robust end-to-end LSTM sequence learner and word embeddings. We demonstrate the applicability of our system in English and German language for encyclopedic or medical text. Tasty is currently being tested in interactive applications for text production, such as scientific research, news editorial, medical anamnesis, help desks and product reviews.

2014

pdf bib
Nerdle: Topic-Specific Question Answering Using Wikia Seeds
Umar Maqsud | Sebastian Arnold | Michael Hülfenhaus | Alan Akbik
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations