Language Technologies for Suicide Prevention in Social Media

At present, the suicide phenomenon is raising, having a relevant impact on our society. Each year about one million people die as a result of suicidal behavior becoming an economic, social and human problem. On the other hand, the use of Social Media as a means of communication is becoming extremely popular, through which their emotional states and impressions are exchanged. Therefore, it is no surprise that more and more people with depression publish their suicide notes in these communication channels. In this context, Information Technologies and Communications and, more specifically, Language Technologies play an important role in the early detection of the depression, their causes and their terrible consequences. Based on these considerations, it is mandatory to provide societal, environmentally approaches and solutions to tackle these societal challenges. This work pretends to be an exhaustive survey of the different researches in this scope, in order to explain which methodologies, technologies and resources are used in the detection of mental problems by means of the Social Media analysis as well as to re-veal their deficiencies.


Introduction
In Europe, suicide has become the leading cause of violent death (WHO, 2014). Each year 804.000 people die in the world as a result of suicidal behavior and the number of attempts is about 20 times higher (WHO, 2012;WHO, 2014). It is estimated that in 2020, about 1.53 million people will die as a result of suicidal acts. Preventing suicide is one of the five areas of priority of the European Pact for Mental Health and Well-Being 1 , which was launched by the European Commission in 2008. Suicide is the third leading cause of violent death among people aged 15 to 44, followed by accidents and homicides (Holmes et al., 2007), and it would be the second reason that would explain the deaths in the group of people aged 15 to 19 years (WHO, 2014). Suicidal behaviors can be defined as a complex process that can range from suicidal ideation (communicated through verbal or nonverbal means) to planning of suicide, attempting suicide, and in the worst case, the suicide itself. These behaviors are influenced by interacting biological, genetic, psychological, social, environmental and situational factors (Wasserman et al., 2004). Suicide has also been strong linked to inequity, social exclusion and socio-economic deprivation (Berk and Dodd, 2006). It is an enormous problem that causing unnecessary human suffering and immeasurable costs for society. According to Josee Van Remoortel, advisor to the European organization Mental Health Europe 2 (MHE), the financial crisis is affecting "all areas of life", not just economies, and its impact on mental health is creating a "deep chasm in our society". -------------

Internet and Social Media Penetration
In the other hand, studies reveal that between 90.1% and 97.8% of young people between 10 and 15 years, access the Internet 3 ; and, around 88.5% of youth aged 16 to 24 choose social networks as a way to communicate. Therefore, the use of this type of technology can be up to 90.2%, in the case of students (García-Rabagó et al., 2010). Forums, chats, social networks, blogs, microblogs or e-mails are virtual spaces where Internet users can interact freely and even fantasize, using anonymous identities. This implies that people with suicidal tendencies, tend to express their thoughts, desires and intentions in pro-suicide forums and share with other people, feelings and intentions (Moreno Gea and Blanco Sanchez, 2012). They also warn their suicidal intentions through the Web in real time, before and while committing the act (Sarno, 2008). The study conducted in (Mingote et al., 2004) also proves that the younger are one of the population segments where prevention is particularly necessary, finding that 20% of suicides occur among adolescents and young adults.

Suicide Prevention in Social Media
It is important to raise awareness on that most self-inflicted deaths are potentially preventable. Well-known studies concerning research on suicide (Owen et al., 2012;Isometsa, 2001;Cantor, 2000;Rudestam, 1971) show that a high number of people who decide to end with their lives. Through suicide had no prior contact with mental health services, but had communicated their suicidal plans or thoughts directly or indirectly through different means to members of their family, friends, colleagues, or through their social networks. Improve the staff skills in early recognition of suicide warning signs, is an essential issue to prevent suicidal mortality.
There is an increasing tendency (Ruder et al., 2011) where suicide notes are posted on the social media (e.g., Facebook, Twitter), where Internet users (and not necessarily teenagers) announce their suicidal thoughts before committing suicide. This poses new challenges for human language technologies since, traditional existing automatic tools are not able to process the new language employed in the social media (abbreviations, slang, smiles and, more generally, a low unstructured and highly informal language).
The Internet, and specifically the Web 2.0, is an important source of information for learning about suicidal behaviors (Dunlop et al., 2011). The way individuals respond to help request from people at high risk of suicide or interact with them can lead to the fact that the potential suicidal may reconsider his/her final decision, or, on the contrary, encourage and accelerate the process of ending with his/her life (Wasserman et al., 2004).
During the recent years, some popular social networks, such as Facebook have become the most important means of social communication, with nearly 1,230 million of registered users world-wide 4 , 70% of whom are young people who make frequent use of this type of media, through which the emotional states and impressions are exchanged (Dunlop et al., 2011;Lenhart et al., 2010). These social networks are also a way to find comfort and welfare and its usage promotes contact and positive support among young people, especially among those with mental disorders (Ellison et al., 2007). In addition, the Web offers possibilities for early detection on suicidal behaviors and it may constitute a cost-effective means of intervention based on a first step care approach. Their use may be of help for identifying pro-suicide messages, detecting group patterns, analyzing exposure to the warning signs and intervening in a personalized manner with people at risk who are willing to accept professional help. In order to raise awareness of the ways people could get help when showing a suicidal behavior, Facebook and the Samaritans 5 association developed a joint initiative consisting in adding a new feature to Facebook, where anyone worried about a friend could fill out a form, related to suicide prevention. With the development of Web 2.0 new forms of communication arise that allow to interactively disseminate information through forums, blogs, micro-blogs, mobile apps, etc.
These technologies provide new opportunities to define and develop suicide prevention strategies. The use of e-health technologies has many beneficial applications for society. Every day millions of people access the Web to find, provide and share information about opinions, feelings and even plans and intentions. Recognizing suicidal warning signs will be the first necessary step to help and offer support to these people. A study that examines the warning signs of suicide on the Internet (Mandrusiak et al., 2006) found that the searches with the terms "warning signals" and "suicide" produced approximately 183,000 outcomes. Warning signs could be categorized in terms of cognitive content, behavioral, situational or other indicators concerning psychological characteristics or interpersonal problems. They could identify suicidal groups that need urgent intervention. Internet searches for suicide may provide a faster way of monitoring possible trends in suicide.
Some clinical studies observed that depressed patients frequently speech slow, uniform, monotonous and with a low voice (Kuny and Stassen, 1993) or to have psychomotor symptoms and this is reflected in the speech (Sobin and Sackeim, 1997). Moreover, emotions and mood can influence the speaking behavior of a person and the characteristics of the sound in speech (Kuny and Stassen, 1993;Bachorowski and Owren, 1995;Sobin and Alpert, 1999;Scherer, 2003;Goudbeek and Scherer, 2010). The speech of depressed patients is characterized by a longer pause duration, that is, an increased amount of time between speech utterances as well as by a reduced variability in mean vocal pitch (Lamers et al., 2014). For these reasons, the acoustic speech features can be used to build models and algorithms for automated depression detection in clinical scenarios.
But these acoustic features are not the only ones that can be used, Internet usage itself (Katikala-pudi et al., 2012), social networking behaviors (Moreno et al., 2011;Choudhury et al., 2012) or location sharing (Park et al., 2013) can vary as a function of being depressed. (Quercia et al., 2012) found correlations between sentiment and levels of popularity, influence and general well-being using the network relations among users and (O'Connor et al., 2010) used a measure of public opinion. All these methods can be applied to analyze emotion in suicide notes (Liakata et al., 2012).

Human Language Technologies and Suicide Prevention
In order to resolve this social issue, Language Technologies (LT) could help with the early identification of "suicide warning signs" that will be useful to detect individuals with suicidal ideation, as well as virtual environments where pro-suicide information is being shared or suicidal attempts are being encouraged. In particular, LT can analyze language structures and their meaning (Navigli, 2009) on different textual genres. Tasks such as information retrieval (Salton and McGill, 1986), information extraction (Cowie and Lehnert, 1996), text classification and clustering (Sebastiani, 2002), or sentiment analysis (Pang and Lee, 2008) are basic pillars of these technologies that allow the construction of more complex automatic processes for discovering knowledge from oral and/or written text. Recent research in LT has been proved great potential in the area of healthcare. From the development of applications to assist medical practitioners in the access and management of information about patients, e.g. (Iakovidis and Smailis, 2012;Vest, 2012), to the creation of computer programs to support and/or facilitate reading comprehension for language-impaired children during communication (Dietz et al., 2011;Wang and Paul, 2011). So far, and to the best of our knowledge, very little effort has been made to apply LT for the benefit of suicide prevention.
The linguistic analysis of suicide notes has a long history and already started as early as 1956 with the work of (Shneidman and Farberow, 1956), followed by several others (Osgood and Walker, 1959;Gleser et al., 1961;Edelman and Renshaw, 1982). The basis of most of this research was a corpus of 66 suicides notes, half genuine and half simulated, collected by Schneidman and the task was to identify those textual features which could differentiate between genuine and fake notes. Whereas the earlier work mostly focused on the manual analysis and detection of these differentiating features, e.g. by relying on techniques from discourse analysis (Shneidman and Farberow, 1956) or by focusing on shallow text characteristics such as the usage of modals and auxiliaries (Osgood and Walker, 1959), the choice of verbs and adverbs (Gleser et al., 1961), etc. We can observe a recent tendency to also rely on automatic corpus analysis techniques for the automatic detection of suicide messages. (Shapero, 2011), for example, studied two corpora of suicide notes in an attempt to define the typical suicide note. For doing so, she automatically calculated word usage and semantic concepts in the notes. (Pennebaker and Chung, 2011) used the frequency of verbal elements in a narrative that express a certain mood or sentiment which show that there is also ample evidence that text mining techniques based on the frequency of certain terms can be applied to narratives from patients in order to monitor changes in mood. As far as we know, (Pestian et al., 2010) were the first to experiment with the use machine learning techniques for the automatic classification of suicide notes. In experiments on the earlier described data set of 66 notes, they investigated whether a machine learning system was able to classify suicide notes with a higher accuracy than mental health professionals. They showed that the best machine learners were indeed able to outperform the human experts. More recent studies confirm this fact (Janssen et al., 2013), underlining once again the added value of automated speech analysis. (Howes et al., 2014) present an initial investigation into the application of computational linguistic techniques, such as topic and sentiment modelling, to online therapy for depression and anxiety using Latent Dirichlet Allocation (Blei et al., 2003). However, early works tried to detect specific emotions such as anger, surprise, fear, etc. using dictionary-based or machine-learning-based approaches (Chuang and Wu, 2004;Seol et al., 2008) and more recently (Purver and Battersby, 2012;Choudhury et al., 2012;Neuman et al., 2012;Howes et al., 2014).
Although interesting research was conducted on the Schneidman data set, the focus should not be on distinguishing between genuine and elicited suicide notes. Instead, it is of key importance to determine what exactly makes a note a real suicide note, independently of the features of the elicited notes or the distinguishing characteristics between both types of notes. Such a suicide note corpus of positive-only data, annotated with fine-grained emotions, was released in the framework of the 2011 i2b2 Natural Language Processing Challenge (Pestian et al., 2012) on emotion classification in suicide notes. Although the scope of the challenge (differentiating between emotions in positiveonly data) was different, it led to the creation of a permanently available resource facilitating future research in emotion detection in suicide notes. The corpus contains the notes written by 1319 people, before they committed suicide. The notes were collected between 1950 and 2011. Spelling and grammar errors were kept in the data. All notes were anonymized by replacing all names with other values and by randomly shifting dates within the same year. The data set of the challenge consisted of a training set of 600 suicide notes, and a test set of 300 notes. The challenge itself revealed that not only shallow lexical, but also semantic features contributed to classification performance. However, many challenges remain to be investigated: the sensitivity of the current systems to spelling and other errors -especially in online data-, the lack of deep understanding of the data through the use of mainly shallow features, etc. The release of this data set has made it possible to accurately detect and differentiate between different emotions, which might be indicative of suicidal behavior.
For the automatic detection and classification of emotions in suicidal content, we can rely on the recent advances in the domain of LT (Jurafsky and Martin, 2008) and machine learning (Mitchell, 1997). Whereas the international LT research community until recently mainly focused on the "factual" aspects of content analysis, we can observe an additional growing interest in the analysis of attitude and affect in textual sources, especially in online content such as blogs, tweets, social network data, etc. The extraction of affective contents does not only imply the detection of opinions, evaluations, beliefs and speculations in text (topics which have a high application potential in customer intelligence applications and the like), but also the identification of certain emotions. For example, how do people express their intent to commit suicide? The use of machine learning techniques and sentiment analysis techniques for the automatic analysis of suicide notes is not new. (Huang et al., 2007), for example, experimented with lexicon-based sentiment analysis for the automatic detection of suicidal blogs. (Pestian et al., 2010) combined shallow text characteristics, such as part-of-speech information, readability scores and parse information with the machine learning software as available in the Weka package.
Until we know, the most complete research about suicide prevention in the social networks, specifically Facebook, is the work of (Schwartz et al., 2014). However, instead of trying to detect suicide notes or to differentiate people with or without mental disorders, they measure the changes across time of the degree of depression.

Conclusions
The magnitude of the suicide in the EU member states and the rest of the world make suicide prevention not exclusively a problem of Mental Health. This is a problem that must be addressed from a multidisciplinary perspective, involving different areas. Internet Technologies and Communication and, more specifically, the Human Language Technologies can help to resolve part of these problems through the early detection of suicidal thoughts and/or behavior expressed through the Social Media. The words and the way people use to communicate in their blogs, social networks, etc. provide information about the psychological state and personality of individuals. The processing and analysis of natural language texts shared via Internet helps record and detect changes in cognitive and emotional state of the people. Unfortunately, although there are available resources and tools for sentiment analysis and opinion mining, even in the field of the depression detection and using different approaches and features, there is neither system nor platform that deal with the full process of suicide prevention.