Modeling intra-textual variation with entropy and surprisal: topical vs. stylistic patterns

Stefania Degaetano-Ortlieb, Elke Teich


Abstract
We present a data-driven approach to investigate intra-textual variation by combining entropy and surprisal. With this approach we detect linguistic variation based on phrasal lexico-grammatical patterns across sections of research articles. Entropy is used to detect patterns typical of specific sections. Surprisal is used to differentiate between more and less informationally-loaded patterns as well as type of information (topical vs. stylistic). While we here focus on research articles in biology/genetics, the methodology is especially interesting for digital humanities scholars, as it can be applied to any text type or domain and combined with additional variables (e.g. time, author or social group).
Anthology ID:
W17-2209
Volume:
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Feldman, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Venue:
LaTeCH
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
68–77
Language:
URL:
https://aclanthology.org/W17-2209
DOI:
10.18653/v1/W17-2209
Bibkey:
Cite (ACL):
Stefania Degaetano-Ortlieb and Elke Teich. 2017. Modeling intra-textual variation with entropy and surprisal: topical vs. stylistic patterns. In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 68–77, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Modeling intra-textual variation with entropy and surprisal: topical vs. stylistic patterns (Degaetano-Ortlieb & Teich, LaTeCH 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2209.pdf