2018
pdf
bib
abs
Dynamics of an idiostyle of a Russian suicidal blogger
Tatiana Litvinova
|
Olga Litvinova
|
Pavel Seredin
Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic
Over 800000 people die of suicide each year. It is es-timated that by the year 2020, this figure will have in-creased to 1.5 million. It is considered to be one of the major causes of mortality during adolescence. Thus there is a growing need for methods of identifying su-icidal individuals. Language analysis is known to be a valuable psychodiagnostic tool, however the material for such an analysis is not easy to obtain. Currently as the Internet communications are developing, there is an opportunity to study texts of suicidal individuals. Such an analysis can provide a useful insight into the peculiarities of suicidal thinking, which can be used to further develop methods for diagnosing the risk of suicidal behavior. The paper analyzes the dynamics of a number of linguistic parameters of an idiostyle of a Russian-language blogger who died by suicide. For the first time such an analysis has been conducted using the material of Russian online texts. For text processing, the LIWC program is used. A correlation analysis was performed to identify the relationship between LIWC variables and number of days prior to suicide. Data visualization, as well as comparison with the results of related studies was performed.
2017
pdf
bib
abs
Deception detection in Russian texts
Olga Litvinova
|
Pavel Seredin
|
Tatiana Litvinova
|
John Lyell
Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics
Humans are known to detect deception in speech randomly and it is therefore important to develop tools to enable them to detect deception. The problem of deception detection has been studied for a significant amount of time, however the last 10-15 years have seen methods of computational linguistics being employed. Texts are processed using different NLP tools and then classified as deceptive/truthful using machine learning methods. While most research has been performed for English, Slavic languages have never been a focus of detection deception studies. The paper deals with deception detection in Russian narratives. It employs a specially designed corpus of truthful and deceptive texts on the same topic from each respondent, N = 113. The texts were processed using Linguistic Inquiry and Word Count software that is used in most studies of text-based deception detection. The list of parameters computed using the software was expanded due to the designed users’ dictionaries. A variety of text classification methods was employed. The accuracy of the model was found to depend on the author’s gender and text type (deceptive/truthful).
pdf
bib
abs
Differences in type-token ratio and part-of-speech frequencies in male and female Russian written texts
Tatiana Litvinova
|
Pavel Seredin
|
Olga Litvinova
|
Olga Zagorovskaya
Proceedings of the Workshop on Stylistic Variation
The differences in the frequencies of some parts of speech (POS), particularly function words, and lexical diversity in male and female speech have been pointed out in a number of papers. The classifiers using exclusively context-independent parameters have proved to be highly effective. However, there are still issues that have to be addressed as a lot of studies are performed for English and the genre and topic of texts is sometimes neglected. The aim of this paper is to investigate the association between context-independent parameters of Russian written texts and the gender of their authors and to design predictive re-gression models. A number of correlations were found. The obtained data is in good agreement with the results obtained for other languages. The model based on 5 parameters with the highest correlation coefficients was designed.
pdf
bib
abs
Deception Detection for the Russian Language: Lexical and Syntactic Parameters
Dina Pisarevskaya
|
Tatiana Litvinova
|
Olga Litvinova
Proceedings of the 1st Workshop on Natural Language Processing and Information Retrieval associated with RANLP 2017
The field of automated deception detection in written texts is methodologically challenging. Different linguistic levels (lexics, syntax and semantics) are basically used for different types of English texts to reveal if they are truthful or deceptive. Such parameters as POS tags and POS tags n-grams, punctuation marks, sentiment polarity of words, psycholinguistic features, fragments of syntaсtic structures are taken into consideration. The importance of different types of parameters was not compared for the Russian language before and should be investigated before moving to complex models and higher levels of linguistic processing. On the example of the Russian Deception Bank Corpus we estimate the impact of three groups of features (POS features including bigrams, sentiment and psycholinguistic features, syntax and readability features) on the successful deception detection and find out that POS features can be used for binary text classification, but the results should be double-checked and, if possible, improved.