Sebastian Brarda

2018

We introduce a novel type of text representation that preserves the 2D layout of a document. This is achieved by encoding each document page as a two-dimensional grid of characters. Based on this representation, we present a generic document understanding pipeline for structured documents. This pipeline makes use of a fully convolutional encoder-decoder network that predicts a segmentation mask and bounding boxes. We demonstrate its capabilities on an information extraction task from invoices and show that it significantly outperforms approaches based on sequential text or document images.

2017

pdf bib abs

Sequential Attention: A Context-Aware Alignment Function for Machine Reading
Sebastian Brarda | Philip Yeres | Samuel Bowman
Proceedings of the 2nd Workshop on Representation Learning for NLP

In this paper we propose a neural network model with a novel Sequential Attention layer that extends soft attention by assigning weights to words in an input sequence in a way that takes into account not just how well that word matches a query, but how well surrounding words match. We evaluate this approach on the task of reading comprehension (on the Who did What and CNN datasets) and show that it dramatically improves a strong baseline—the Stanford Reader—and is competitive with the state of the art.

Co-authors

Anoop R Katti 1

Christian Reisswig 1

Philip Yeres 1

Venues

EMNLP1
RepL4NLP1

Fix author