Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research

Susanne Haaf

Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research

Abstract

This paper poses the question, how linguistic corpus-based research may be enriched by the exploitation of conceptual text structures and layout as provided via TEI annotation. Examples for possible areas of research and usage scenarios are provided based on the German historical corpus of the Deutsches Textarchiv (DTA) project, which has been consistently tagged accordant to the TEI Guidelines, more specifically to the DTA ›Base Format‹ (DTABf). The paper shows that by including TEI-XML structuring in corpus-based analyses significances can be observed for different linguistic phenomena, as e.g. the development of conceptual text structures themselves, the syntactic embedding of terms in certain conceptual text structures, and phenomena of language change which become obvious via the layout of a text. The exemplary study carried out here shows some of the potential for the exploitation of TEI annotation for linguistic research, which might be kept in mind when making design decisions for new corpora as well when working with existing TEI corpora.

Anthology ID:: L16-1692
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 4365–4372
Language:
URL:: https://aclanthology.org/L16-1692/
DOI:
Bibkey:
Cite (ACL):: Susanne Haaf. 2016. Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4365–4372, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research (Haaf, LREC 2016)
Copy Citation:
PDF:: https://aclanthology.org/L16-1692.pdf

PDF Cite Search Fix data