Daniel Voinea


2022

pdf bib
Large Sequence Representation Learning via Multi-Stage Latent Transformers
Ionut-Catalin Sandu | Daniel Voinea | Alin-Ionut Popa
Proceedings of the 29th International Conference on Computational Linguistics

We present LANTERN, a multi-stage transformer architecture for named-entity recognition (NER) designed to operate on indefinitely large text sequences (i.e. > 512 elements). For a given image of a form with structured text, our method uses language and spatial features to predict the entity tags of each text element. It breaks the quadratic computational constraints of the attention mechanism by operating over a learned latent space representation which encodes the input sequence via the cross-attention mechanism while having the multi-stage encoding component as a refinement over the NER predictions. As a proxy task, we propose RADAR, an LSTM classifier operating at character level, which predicts the relevance of a word with respect to the entity-recognition task. Additionally, we formulate a challenging novel NER use case, nutritional information extraction from food product labels. We created a dataset with 11,926 images depicting food product labels entitled TREAT dataset, with fully detailed annotations. Our method achieves superior performance against two competitive models designed for long sequences on the proposed TREAT dataset.