PHASE: Learning Emotional Phase-aware Representations for Suicide Ideation Detection on Social Media

Recent psychological studies indicate that individuals exhibiting suicidal ideation increasingly turn to social media rather than mental health practitioners. Contextualizing the build-up of such ideation is critical for the identification of users at risk. In this work, we focus on identifying suicidal intent in tweets by augmenting linguistic models with emotional phases modeled from users’ historical context. We propose PHASE, a time-and phase-aware framework that adaptively learns features from a user’s historical emotional spectrum on Twitter for preliminary screening of suicidal risk. Building on clinical studies, PHASE learns phase-like progressions in users’ historical Plutchik-wheel-based emotions to contextualize suicidal intent. While outperforming state-of-the-art methods, we show the utility of temporal and phase-based emotional contextual cues for suicide ideation detection. We further discuss practical and ethical considerations.


Introduction
Every 10.9 minutes, a person dies of suicide (Drapeau and McIntosh, 2020). Suicide ranks as the second leading cause of death for 14-35 yearolds (Hedegaard et al., 2020) in US. Extending appropriate clinical and psychological care to suicidal people relies on identifying those at risk. Unfortunately, 80% of patients do not undergo clinical treatment, and about 60% of those who died of suicide denied having any suicidal thoughts to mental health practitioners (McHugh et al., 2019;Franklin et al., 2017). In contrast, people exhibiting suicidal ideation often use social media to express their feelings (Coppersmith et al., 2014(Coppersmith et al., , 2016(Coppersmith et al., , 2018Robinson et al., 2016;Reger et al., 2020), with eight out * Equal contribution 1 Code is available at https://github.com/ midas-research/phase-eacl/ guys work hard it is worth it! btw we also reached the football finals #goforvictory A user's tweeting history with timestamp information variation in emotions expressed suicidal intent Figure 1: We study a user's tweeting history and emotional progression. Note that while the user's most recent tweet (blue) shows a subtle indication of suicidal intent, it is not sufficient to ascertain suicide risk.
Grouping the build-up of negative emotions (red) in the user's historical tweets into phase-like emotional progressions, by utilizing the elapsed time between tweets, can contextualize the user's state and provide a more accurate and interpretable risk assessment. All examples in this paper have been anonymized and paraphrased as per a moderate disguise scheme (Bruckman, 2002) to protect user privacy (Chancellor et al., 2019b).
of ten people disclosing their suicidal thoughts and plans on social media (Golden et al., 2009).
Natural Language Processing (NLP) presents an encouraging prospect to complement social science to identify risk markers in user behavior (De Choudhury et al., 2013 to aid suicide risk assessment (Shing et al., 2018(Shing et al., , 2020. However, suicide ideation is complex, and often, individual posts may not be sufficient to assess a user's suicide risk, even for humans (Sisask et al., 2008;. Figure 1 illustrates how features such as historical posts (Matero et al., 2019) can add context for analyzing a user's online behavior over time (Van Heeringen and Marušic, 2003) to better ascertain suicide risk. Despite the success of usercentric contextual models (Flek, 2020) for suicide ideation detection, they have two major limitations.
First, recurrent neural networks, particularly LSTMs, that are natural methods to learn patterns from a sequence of a user's historical tweets (Cao et al., 2019;Zeng et al., 2019Zeng et al., , 2020, assume uniform time gaps between successive tweets. However, tweets can be posted at irregular time intervals (Lei et al., 2018), and varying time gaps can influence the assessment of a user's suicidality progression (Chen et al., 2018), as shown in Figure 1.
Second, these methods implicitly assume that a user's mental and emotional state progression is smooth in time, with an ever-increasing tendency. However, in reality, studies show that emotional (Larsen et al., 2015), and suicidality progression can vary significantly (Bryan and Rudd, 2016;Bryan, 2020), and show fluctuating phase like patterns (Kiosses et al., 2014;Palmier-Claus et al., 2012). Analyzing such phase-wise emotion progressions and build-up, as illustrated in Figure 1, can be instrumental in contextualizing suicidal risk, and aiding clinical psychologists through increased interpretability in human-in-the-loop systems. 2 Building on these limitations, and motivated by psychological studies (Neacsiu et al., 2018;Domínguez and Fernández, 2018) of emotional state progression, we propose PHASE: PHase-Aware Suicidality identification Emotion progression model. With PHASE, we present the first neural framework to identify suicide ideation on social media ( §3.1) that explicitly models the inherent phase-aware progressions in users' emotional spectrums in a contextual time-aware manner.
We present the following key contributions: (i) First, building on the success of large scale pretraining in NLP, we utilize Plutchik Transformer, a transformer to learn linguistic and Plutchik-based (Plutchik, 1980) emotional cues from tweets ( §3.2).
(ii) We propose Time-Sensitive Emotion LSTM (TSE-LSTM) to learn the historical emotional progression of a user's mental states from their learned emotional spectrum in a time-aware manner ( §3.3).
(iii) Based on psychological studies, we propose a novel method to learn users' emotional phase progressions by leveraging the amount of historical emotional context used to update the TSE-LSTM's cell state. PHASE identifies the onset of new emotional phases and learns a temporal phase-aware 2 Similar to the post-screening on Facebook (Card, 2018). emotional user representation ( §3.4) that is then used to identify suicide ideation in their recent tweets ( §3.5), increasing the system transparency.
(iv) Through a series of experiments ( §4.2), we show that PHASE significantly (p < 0.005) outperforms competitive methods, which do not take users' emotional phases into account ( §5).
(v) We analyze the contributions of PHASE's individual components to suicide ideation detection ( §5.2, §5.3, §5.4), assess its transparency and limitations through qualitative analysis ( §5.5), and conclude by discussing the ethical implications and practical applicability of this study ( §6).

Related Work
Traditional Methods: Researchers have devised various psychoclinical methods to assess suicidal risk (Pestian et al., 2016), such as the Suicide Probability Scale (Bagge and Osman, 1998), Suicide Ideation Questionnaire (wa Fu et al., 2007), Suicidal Affect-Behavior-Cognition Scale (Harris et al., 2015). While these methods are professional and effective, they require participants to answer questionnaires (Venek et al., 2017) or engage in interviews (Scherer et al., 2013), hence not reaching people who cannot access these resources or have a low motivation to seek professional help (Zachrisson et al., 2006;Essau, 2005). Harris and Goh (2016) show that such assessments can negatively impact people showing depressive symptoms.
NLP Methods: Recently, social media has shown promise in providing insights into users' mental states (Paul and Dredze, 2011). Jashinsky et al. (2014) reported that Twitter is a viable tool for real-time monitoring (Braithwaite et al., 2016) of suicide risk. Early efforts in utilizing social media leverage user features such as their age, gender,and social network connectivity (Masuda et al., 2013) and online suicide notes (Pestian et al., 2010). Since then, the focus has been on using psycholinguistic lexicons such as LIWC and textual features such as n-grams, POS tags, etc. for classification (De Choudhury et al., 2016;Sawhney et al., 2018b). Shared tasks such as CLPsych (Zirikly et al., 2019) andCLEF eRISK (Losada et al., 2020) have seen a rise in neural networks such as CNNs (Yates et al., 2017;Du et al., 2018;Naderi et al., 2019;Gaur et al., 2019) and LSTMs (Ji et al., 2018;Tadesse et al., 2020) to predict suicide risk. While these methods capture post semantics in isolation, no user context is leveraged, hindering insight into the user's mental state to improve predictive power (Venek et al., 2017;Flek, 2020). User context includes the user's emotions (Ren et al., 2016;Guntuku et al., 2017), social networks  and historical posts (Mathur et al., 2020).
Contextual Methods: The best performing model, the DualContextBERT (Matero et al., 2019), at CLPsych 2019 for suicidal estimation exemplifies the utility of temporal context. The DualCon-textBERT models post embeddings sequentially via an RNN. Such RNN-based approaches assume that users' historical posts are equally spaced in time, hindering their ability to learn their relative importance in a time-aware manner. Recently, timeaware modeling of well defined stages in numerical time series data shows promising results in clinical tasks like patient subtyping (Baytas et al., 2017) and disease progression (Gao et al., 2020). However, the time-sensitive phase extraction of user-generated posts on social media, and phaseaware modeling of textual data is underexplored and complex, as it involves noisy, unstructured and ambiguous inputs across irregular time intervals.

Notations and Problem Formulation
We formulate suicidal intent detection as a binary classification task to predict suicidal intent y i for a tweet t i , where, y i ∈ {suicidal intent present, suicidal intent absent}. We denote the tweet to be assessed for the presence of suicidal intent as t i ∈ T = {t 1 , t 2 , · · · , t N }, authored by a user u j ∈ U = {u 1 , u 2 , · · · , u M }, posted at time τ i curr . Each tweet t i is associated with history is a historic tweet authored by user u j posted at time τ i k with τ i 1 < τ i 2 < · · · < τ i L < τ i curr . As shown in Figure 2, PHASE first obtains a user's emotion spectrum from their historical tweets and the tweet to be assessed using a finetuned BERT model, Plutchik Transformer. We feed the historical tweet representations to our proposed Time-Sensitive Emotion LSTM to learn the temporal progression of a user's emotions. We then identify phases in a user's emotions from their learned historical emotional progression, and extract temporal features for a user from these phases using Phase-Adaptive convolutions. Finally, PHASE jointly learns the semantics of user tweets and their historical emotional context in a temporal phase-  aware manner for suicide ideation detection in a tweet.

Plutchik Transformer: Encoding Tweets
Studies show that emotions expressed in suicidal tweets are correlated with suicidal behavior (Sueki, 2015;Spates et al., 2018;Zhang et al., 2017). As a building block, we utilize Plutchik's wheel of emotions (Plutchik, 1980) to capture the emotions expressed by a user in their tweets. Plutchik's wheel outlines eight primary emotions arranged as four pairs of opposing dualities: Joy -Sadness, Surprise -Anticipation, Anger -Fear, and Trust -Disgust. We utilize Plutchik Transformer ), a BERT model fine-tuned on Emonet (Abdul-Mageed and Ungar, 2017), a dataset of 790,059 tweets labeled across 8 primary emotions as per Plutchik's wheel of emotions. Owing to the success of pre-training language models in NLP, Plutchik Transformer jointly learns textual and emotion features for representation learning of user tweets for subsequent suicidal intent detection. We extract a 768-dimension encoding from the [CLS] 3 token of the penultimate transformer layer, which is densely connected with an 8-dimensional output layer representative of each primary emotion.
Tweet to be assessed: We encode each tweet to be assessed t i as: where T i ∈ R 768 is linearly transformed using a dense layer to T i ∈ R d with dimension d.
Historical Tweet Encoding: A holistic representation of users' emotional states can be indicative of variations in risk markers over time (Aragón et al., 2019;Tarrier et al., 2007;Links et al., 2008). To this end, we utilize Plutchik Transformer to encode each historical tweet h i k to an emotion representation (e i k ∈ R 768 ) defined as:

Temporal Modeling of Historical Tweets
Building on these natural irregularities in posting times of historical tweets (Wojcik and Hughes, 2019), we propose the use of ON-LSTM  to encode the sequence of a user's historical tweet emotion representations e i k to capture the variation in their mental and emotional states over time, forming a Time-Sensitive Emotion LSTM (TSE-LSTM). In our TSE-LSTM, we introduce a time-sensitive long-term gatef k , which contains older historic emotional context. Additionally, we propose a short-term gateĩ k that encodes recent historic tweets, as shown in Figure 3. We then feed the time-lapsed ∆ k from the previous tweet and the historical emotional representation e i k of each tweet h i k to a TSE-LSTM cell. This design aids TSE-LSTM to learn two probability distributions pf k and pĩ k corresponding to the long-term and short-term gates, respectively. Psychological studies show that a user's recent emotions can be more indicative of their current mental state (Fawcett et al., 1990;Homan et al., 2014). To this end, we set the update frequency of the short-term gate higher than the long-term gate to increase the influence of their more recent emotional context. To impose this natural ordering of frequency updates, we apply cumulative sum (cumsum) operation to the probability distributions pf k and pĩ k : where σ represents softmax, ⊕ denotes concatenation andH i k−1 is the previous hidden state. The arrow above cumsum indicates its direction.
Wf , Wĩ, Uf , Uĩ, bf and bĩ are learnable parameters. Following cumsum's properties, the values in the long-term gatef k are monotonically increasing from 0 to 1, and those in the short-term gateĩ k are monotonically decreasing from 1 to 0. For each historic tweet h i k , the long-term gatẽ f k controls the historic emotional context to be discarded, and the short-term gateĩ k controls the importance of recent historic emotions. To obtain complete contextual information of overlapping context inf k andĩ k , we introduce a historic overlap vector w k that uses the standard forget and input gates, f k and i k , respectively. We define the new update function for TSE-LSTM's cell state c k as: where computation for the intermediate cell statê c k , output gate o k , the hidden stateH i k are the same as in the standard LSTM and W c , U c , b c are network parameters. The hidden stateH i k represents the learned emotional context of the user.

Learning Emotional Phase Progression
We now describe how we use the emotional context learned by the TSE-LSTM to capture emotional phase progression patterns for a user over time. We then describe PHASE's Phase Adaptive Convolutions (PACs) that capture user features closely related to the user's current state through convolutions over these learned emotional phases. The PACs thus extract a phase-aware emotional user representation for suicide ideation detection.
Emotion Phase Variation: We leverage the historical emotional contextH i k from the TSE-LSTM to extract temporal variations in a user's emotional state for a macroscopic view of the progression of emotional phases. Building on the work of Gao et al. (2020), we capture the onset of a new emotional phase by observing the proportion of historic context discarded to update the cell state c k . When almost no historical emotional context is used to update the cell state c k , we say that a new emotional phase of the user has begun. Formally, we use a phase split point (s k ) that represents the time, before which all the emotional historic context is discounted (pf k ), as s k = argmax(pf ), as shown in Figure 4 (Gao et al., 2020). Intuitively, a large value of s k means little historic context is used to update the state cell c k , indicating the onset of a new emotional phase; whereas, a smaller value of s k suggests a long-term dependency of the emotions expressed in the tweet (h i k ) on historic emotions. Since argmax is non-differentiable, we estimate the phase split point (s k ) as: where N h is the dimension ofH i k ,f k (i) and pf (i) are i th values in the long-term gatef k , and pf .
We then compute the elapsed time between two consecutive phases by measuring the difference between the proportion of historic context discarded at each timestep. For each emotional phase of a user within an observation window of length L w , we define this phase variation time ∆s as: Phase Adaptive Convolution (PAC): We now extract features from the emotional phase buildup leading towards the tweet to be assessed. The PAC extracts features from the learned phase-wise progression of a user's temporal emotional context in the most recent emotional phase, as shown in Figure 4. We feed the concatenated historical hidden statesH i k−L w :k = [H k−L w , · · · ,H i k ], in the observation window L w , as an input to a weighted temporal convolution. Naturally, emotions corresponding to more recent phases of a user are more indicative of their current mental state, and should be more influential (Larsen et al., 2009). Hence, we weigh the importance of the learned historical emotional context through the phase variation time ∆s (Gao et al., 2020). We perform a convolution with a p th 1-dimensional learnable kernel (m j p ) for each j th hidden state in the observation window as: where * is convolution operation, u p is output of p th kernel of size L w . We concatenate all extracted features as u = [u 1 , · · · , u N h ] ∈ R N h to obtain a user's phase-aware emotion representation.

PHASE Joint Network Optimization
Finally, we concatenate encoded representations of the tweet to be assessed T i and the historic emotional context u, followed by softmax (σ) over a dense layer with a Rectified Linear Unit (ReLU ).
whereŷ i is the final suicide risk assessment and {W y , b y } are learnable network parameters. Tweets with SI present form a very small proportion of the data (Ji et al., 2019). To address this problem of class imbalance (the imbalance is much greater in the real world), we train PHASE using Class-Balanced Focal Loss Cui et al., 2019). This loss function re-weights loss inversely with the effective number of samples per class, thereby yielding a class-balanced loss L as: where CB f ocal is class-balanced focal loss,ŷ i is the predicted label and y i is the label of the tweet to be assessed. β and γ are hyperparameters.

Data and Preprocessing
We build on an existing Twitter data curated by . The data includes 34,306 tweets authored by 32,558 unique users. These tweets were identified based on a lexicon of 143 suicidal phrases (e.g., "wanting to die", "last day"). Two students of Psychology annotated the data under the supervision of a professional clinical psychologist, achieving a Cohen's Kappa score of 0.72, under the below guidelines (Sawhney et al., 2018b): Suicidal Intent (SI) Present: Tweets where suicide ideation or previous attempts are discussed in a somber and non-flippant tone. Suicidal Intent (SI) Absent: Tweets with no evidence for risk of suicide, e.g., song lyrics, condolence message, awareness, news. The resulting dataset contains 3984 suicidal tweets. The Twitter timeline was collected for each user, spanning over ten years from 2009 to 2019. The number of historical tweets (748 ± 789) and the time difference between consecutive tweets (2 ± 24 days) are indicative of large variations across users. 4070 users were found to have no historical tweets. We perform a stratified 70:10:20 split, such that the train, validation, and test sets consist of 24014, 3431, and 6861 tweets, respectively, and ensure that there is no overlap between users in these sets.

Baselines and Training Setup
We evaluate PHASE on macro F1 and recall (SI present) with both tweet-and user-level methods.
Tweet-level Non-contextual Baselines RF + TF (Sawhney et al., 2018b): Extracts features including statistical, LIWC features, n-grams (up to 4), and POS counts from the tweet to be assessed and feeds them to a Random Forest (RF) classifier. C-LSTM (Sawhney et al., 2018a): A deep neural network having a CNN followed by an LSTM to extract short and long range features in a tweet. (Gaur et al., 2019): A model that is fed GloVe encoded tweets as a concatenated bag of tweets, non-sequentially to a contextual CNN (Shin et al., 2018) with max pooling (Shing et al., 2018). Suicide Detection Model (SDM) (Cao et al., 2019): Historical tweets encoded using fine-tuned FastText embeddings are fed to a regular LSTM followed by a tweet-level attention mechanism.

User-level Contextual Baselines C-CNN
DualContextBert (Matero et al., 2019): Best performing model at CLPsych 2019. BERT embeddings of each historical tweet are sequentially fed to a regular RNN followed by tweet-level attention. Exponential Decay (Sinha et al., 2019): A deep neural network that models encodes each historical tweet using Glove embeddings followed by a BiL-STM with attention. The historical embeddings are then aggregated using an exponential decay.
Setup: We set hyperparameters for all models based on the validation macro F1 score. We use grid search to explore: N h ∈ {128, 256, 512}, δ ∈ {0.0, 0.1, · · · , 0.5}, β ∈ {0.99, 0.999, 0.9999} and γ ∈ {1.0, 1.5, 2.0}, initial learning rate I lr ∈ {0.01, 0.001, 0.0001}, L w ∈ {1, 2,· · · , 16}. We found the optimal hyperparameters as: N h =512, δ=0.5, β =0.9999, γ=2, I lr =0.0001, L w =5. We implement all methods with PyTorch 1.6, and optimize PHASE using AdamW with a batch size of 128 for 30 epochs in 167 mins on a Tesla K80 GPU. We use the cosine scheduler (Gotmare et al., 2018) with a warmup step of 5. We observe from Table 1 that PHASE significantly (p < 0.005) outperforms all baselines. We note that contextual models outperform the noncontextual RF+TF and C-LSTM, as they learn a holistic representation of a user's mental state. Amongst contextual models, we note that models that factor in the temporal sequence of historical tweets outperform the non temporal C-CNN, that models tweets as a bag-of-tweets. Thereby validating the utility of temporal context for suicide ideation detection. PHASE significantly outperforms state-of-the-art contextual models. We postulate this to PHASE's ability to capture irregularities in tweeting patterns and learning emotional phase  progressions, unlike DualContextBERT and SDM, that ignore both the time-and phase-sensitive and emotional aspects of historical context. PHASE outperforms Exponential Decay, as PHASE adaptively learns progressions of emotional phases, rather than assuming a user's behavior to follow a specific trajectory that might not generalize well across users. This observation is in line with psychological research (Joiner Jr, 2002;Giletta et al., 2015) that shows the progressive build-up to suicidality varies across individuals, that PHASE is able to capture better than competitive models.

PHASE Components Ablation Study
We perform an ablation study to probe the effectiveness of each component of PHASE, as shown in Table 2, starting from the base (Current) model that does not use historical tweets. On modeling the temporal dependencies in historical tweets with a standard LSTM along with the current tweet, we note drastic improvements, revalidating the prominence of user-level context to infer the suicidality of a user. We then observe that on factoring in time-sensitivity through the TSE-LSTM, there is a significant (p < 0.005) improvement in the macro F1 score, but there is no gain in Recall. We believe even though the model gains additional user context by factoring in the time irregularities between tweets, the model does not improve drastically, as it still assumes a continuous smooth progression of the user's emotions in time. This assumption hinders the model's ability to capture the macroscopic context acquired by analyzing the phase like progressions of a user's emotional states (Homan et al., 2014). On adding the PAC that learns phase-aware user representations by extracting emotional progression patterns from their historical emotional context (TSE-LSTM), we observe significant (p < 0.005) improvements. We attribute this improvement to the PAC as it adaptively learns and captures a user's emotional phasewise build-up towards their most recent tweet to be assessed, to correctly contextualize suicidal intent, validating the utility of phase-aware modeling. We now analyze PHASE's performance using different encoders to learn representations for tweets. Overall, we observe that transformers outperform previously used static word embeddings (FastText). Additionally, we observe that Plutchik Transformer, based on Plutchik's wheel of emotions, significantly improves PHASE's performance over the pre-trained BERT used by Matero et al. (2019). This observation revalidates the importance of specific emotional context, as opposed to the more general language features learned by BERT alone.

How much Historical Context is useful?
We explore PHASE's performance variation with the amount of historical lookback in terms of number of days in Figure 6. We observe that PHASE's performance improves as we factor in more historical tweets, going back up to a few months, likely as PHASE gains more context of users' emotional progressions. As we further increase historical lookback beyond several (> 3) months, we observe that PHASE's performance saturates. This observation is in line with psychological studies (Selby et al.,  Figure 7: We study three users with their tweet to be assessed, historic tweets (chronologically ordered), and timestamps, showing how PHASE can aid human moderators and clinical psychologists with explainable predictions. We visualize self-attention (averaged over all 12 Plutchik Transformer heads) per token, where darker intensity denotes higher attention. The graphs show the phase split value s k for each user over time. We also show emotional phase progression for further interpretability, where a peak represents the onset of a new phase. Further, we show detailed phase variation by visualizing the Plutchik-based emotion (learned weights) duality for historical tweets. Kaplow et al., 2014;Glenn et al., 2020), that highlight the diminishing importance of a user's emotions over longer time periods in assessing their current mental state and associated suicide risk.

PHASE Analysis and Interpretation
We now analyze PHASE's preliminary assessment in Figure 7 to elucidate on PHASE's interpretability to aid subsequent human-in-the-loop risk assessment. First, for User A, we see no apparent signs of suicidal intent in their tweet to be assessed, and if analyzed in isolation, is not sufficient to ascertain risk. However, User A's historical tweets add context to models (e.g. PHASE) that leverage temporal emotional cues to identify suicidal intent correctly. Next, we analyze a complex case, User B, where we observe phase-like progressions in their emotions over time. Although User B historically did show negative emotions, recently, User B shows more positive behavior, akin to the onset of a new emotional phase characterized by joy and trust. PHASE's design enables it to learn User B's emotional progression adaptively and correctly predicts User B's tweet to be analyzed as having no suicidal intent, unlike other models that incorrectly assess this as a tweet having suicidal intent. Lastly, we also present the complicated case of User C, where all models fail to explicate the future challenges in online data-driven suicide ideation. Specifically, we find that all models are unable to accurately ascertain suicidal intent where there is little historical context consisting of fluctuating emotions, highlighting the challenges associated with new or alternate accounts of users, amongst other complexities (Shea, 1999;O'Connor and Portzky, 2018).

Ethical and Practical Considerations
Emphasizing the sensitive nature of this work, we acknowledge the trade-off between privacy and effectiveness (Eskisabel-Azpiazu et al., 2017), and utilize publicly available Twitter data in a purely observational (Norval and Henderson, 2017;Broer, 2020), and non-intrusive manner. We separate user data from all other data on protected servers linked only through anonymous IDs, and we perform automatic de-identification of the dataset using named entity recognition (Benton et al., 2017a,b). All examples shown in this work have been paraphrased to protect user privacy (Fiesler and Proferes, 2018;Chancellor et al., 2019a,b). We ensure that this analysis is shared selectively and subject to IRB approval (Zimmer, 2009(Zimmer, , 2010 to avoid misuse such as Samaritan's Radar (Hsin et al., 2016). We acknowledge that suicidality is subjective (Keilp et al., 2012) and that the interpretation of this analysis may vary across individuals (Puschman, 2017). We further acknowledge that suicide risk exists on a diverse spectrum (Bryan and Rudd, 2006), rather than at a binary level, and that the studied data may be susceptible to demographic, annotator, and medium-specific biases (Hovy and Spruit, 2016). Finally, our work does not make any diagnostic claims related to suicide. PHASE should form part of a distributed human-in-the-loop (de Andrade et al., 2018) system for finer interpretation of risk.

Conclusion
Motivated by the rising exhibition of suicide ideation on social media, we present PHASE. Building on psychological studies analyzing the emotional spectrum and mental health of users, PHASE adaptively learns emotional phase-aware user representations through historical tweeting activity for suicidal ideation detection. We propose multiple modeling innovations in PHASE components: contextualized historical emotion representations (Plutchik Transformer), time-sensitive emotion LSTM (TSE-LSTM), and a phase-adaptive convolution (PAC). We demonstrate that modeling user phases explicitly increases the predictive power in assessing suicidality in tweets. In a qualitative analysis, we show how PHASE can aid social media moderators and clinical psychologists in subsequent assessment by displaying its predictions together with the learned emotional phases. Through PHASE, we hope to form a future component in a larger human-in-the-loop infrastructure for suicide prevention.