Modeling Diachronic Change in Scientific Writing with Information Density

Raphael Rubino, Stefania Degaetano-Ortlieb, Elke Teich, Josef van Genabith


Abstract
Previous linguistic research on scientific writing has shown that language use in the scientific domain varies considerably in register and style over time. In this paper we investigate the introduction of information theory inspired features to study long term diachronic change on three levels: lexis, part-of-speech and syntax. Our approach is based on distinguishing between sentences from 19th and 20th century scientific abstracts using supervised classification models. To the best of our knowledge, the introduction of information theoretic features to this task is novel. We show that these features outperform more traditional features, such as token or character n-grams, while leading to more compact models. We present a detailed analysis of feature informativeness in order to gain a better understanding of diachronic change on different linguistic levels.
Anthology ID:
C16-1072
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
750–761
Language:
URL:
https://aclanthology.org/C16-1072
DOI:
Bibkey:
Cite (ACL):
Raphael Rubino, Stefania Degaetano-Ortlieb, Elke Teich, and Josef van Genabith. 2016. Modeling Diachronic Change in Scientific Writing with Information Density. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 750–761, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Modeling Diachronic Change in Scientific Writing with Information Density (Rubino et al., COLING 2016)
Copy Citation:
PDF:
https://aclanthology.org/C16-1072.pdf