Approximating Style by N-gram-based Annotation

Melanie Andresen, Heike Zinsmeister


Abstract
The concept of style is much debated in theoretical as well as empirical terms. From an empirical perspective, the key question is how to operationalize style and thus make it accessible for annotation and quantification. In authorship attribution, many different approaches have successfully resolved this issue at the cost of linguistic interpretability: The resulting algorithms may be able to distinguish one language variety from the other, but do not give us much information on their distinctive linguistic properties. We approach the issue of interpreting stylistic features by extracting linear and syntactic n-grams that are distinctive for a language variety. We present a study that exemplifies this process by a comparison of the German academic languages of linguistics and literary studies. Overall, our findings show that distinctive n-grams can be related to linguistic categories. The results suggest that the style of German literary studies is characterized by nominal structures and the style of linguistics by verbal ones.
Anthology ID:
W17-4913
Volume:
Proceedings of the Workshop on Stylistic Variation
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Julian Brooke, Thamar Solorio, Moshe Koppel
Venue:
Style-Var
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
105–115
Language:
URL:
https://aclanthology.org/W17-4913
DOI:
10.18653/v1/W17-4913
Bibkey:
Cite (ACL):
Melanie Andresen and Heike Zinsmeister. 2017. Approximating Style by N-gram-based Annotation. In Proceedings of the Workshop on Stylistic Variation, pages 105–115, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Approximating Style by N-gram-based Annotation (Andresen & Zinsmeister, Style-Var 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-4913.pdf