Differences in type-token ratio and part-of-speech frequencies in male and female Russian written texts
Tatiana Litvinova | Pavel Seredin | Olga Litvinova | Olga Zagorovskaya
Proceedings of the Workshop on Stylistic Variation
The differences in the frequencies of some parts of speech (POS), particularly function words, and lexical diversity in male and female speech have been pointed out in a number of papers. The classifiers using exclusively context-independent parameters have proved to be highly effective. However, there are still issues that have to be addressed as a lot of studies are performed for English and the genre and topic of texts is sometimes neglected. The aim of this paper is to investigate the association between context-independent parameters of Russian written texts and the gender of their authors and to design predictive re-gression models. A number of correlations were found. The obtained data is in good agreement with the results obtained for other languages. The model based on 5 parameters with the highest correlation coefficients was designed.