Extending standoff annotation

Maik Stührenberg


Abstract
Textual information is sometimes accompanied by additional encodings (such as visuals). These multimodal documents may be interesting objects of investigation for linguistics. Another class of complex documents are pre-annotated documents. Classic XML inline annotation often fails for both document classes because of overlapping markup. However, standoff annotation, that is the separation of primary data and markup, is a valuable and common mechanism to annotate multiple hierarchies and/or read-only primary data. We demonstrate an extended version of the XStandoff meta markup language, that allows the definition of segments in spatial and pre-annotated primary data. Together with the ability to import already established (linguistic) serialization formats as annotation levels and layers in an XStandoff instance, we are able to annotate a variety of primary data files, including text, audio, still and moving images. Application scenarios that may benefit from using XStandoff are the analyzation of multimodal documents such as instruction manuals, or sports match analysis, or the less destructive cleaning of web pages.
Anthology ID:
L14-1274
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
169–174
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/308_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Maik Stührenberg. 2014. Extending standoff annotation. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 169–174, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Extending standoff annotation (Stührenberg, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/308_Paper.pdf