What to Fuse and How to Fuse: Exploring Emotion and Personality Fusion Strategies for Explainable Mental Disorder Detection

,


Introduction
Mental health disorders (MHD) are increasingly prevalent worldwide and constitute one of the greatest challenges facing our healthcare systems and modern societies in general.In response to this societal challenge, there has been a surge in digital mental health research geared towards the development of new techniques for unobtrusive and efficient automatic detection of MHD.Within this area of research, natural language processing techniques are playing an increasingly important role, showing promising detection results from a variety of textual data.Recently, there has been a growing interest in improving mental illness detection from textual data by way of leveraging emotions: 'Emotion fusion' refers to the process of integrating emotion information with general textual information to obtain enhanced information for decision-making.However, while the available research has shown that MHD prediction can be improved through a variety of different fusion strategies, previous works have been confined to a particular fusion strategy applied to a specific dataset, and so is limited by the lack of meaningful comparability.
As a result, the clinical community is increasingly seeking new approaches to the early detection and monitoring of mental health problems that can greatly improve the effectiveness of interventions, reduce their cost, and prevent them from becoming chronic.In this context, Natural Language Processing (NLP) is recognized as having transformative potential to support healthcare professionals and stakeholders in the early detection, treatment and prevention of mental disorders (for comprehensive reviews, see Calvo et al., 2017;Zhang et al., 2022;Zhou et al., 2022).Data from social media are particularly appealing to the NLP research community due to their scope and the deep embeddedness in contemporary culture (Perrin).Research utilizing NLP techniques in combination with social media has yielded new insights into population mental health and shown promise for incorporating datadriven analytics into the treatment of psychiatric disorders (Chancellor and De Choudhury, 2020;Garg, 2023).
Recently, this line of research has developed a growing interest in improving NLP approaches to mental illness detection by leveraging information from related domains, in particular emotion (see Zhang et al., 2023, for a comprehensive review).Behavioral and psychological research has long established links between emotions and mental disorders: For example, individuals with depressive symptoms have difficulty regulating their emotions, resulting in lower emotional complexity (Joormann and Gotlib, 2010;Compare et al., 2014).Disrupted emotion regulation has also been implicated in anxiety (Young et al., 2019).In the light of such links, information about emotions is useful in diagnosing mental disorders.'Emotion fusion' refers to the process of "integrating emotion information with general textual information to obtain enhanced information for decision-making" (Zhang et al., 2023, p. 232).By the same rationale, information fusion approaches are likely to benefit from the inclusion of additional individual characteristics known to be associated with mental disorders, such as personality traits.Like emotion, personality has been linked to a diverse set of mental disorders based on genetic and behavioral evidence: For example, genomewide association studies have demonstrated that genetic risk factors for depression are largely shared with the neuroticism peronality trait (Adams et al., 2019).Correlational studies comparing subjects diagnosed with Major Depressive Disorder (MDD) and healthy control subjects found that vulnerability to depression was associated with several personality dimensions, such that MDD subjects were characterized by high neuroticism and low extraversion, accompanied by low scores on openness and conscientiousness (Nikolic et al., 2020).Analyses language of use of Twitter users with self-disclosed depression and PTSD revealed that text-derived personality played s an important role in predicting the mental disorders (Preoţiuc-Pietro et al., 2015).
In addition to the question of 'what to fuse', information fusion approaches also raise the algorithmic question of 'how to fuse' the auxiliary information effectively.The available research has shown that MHD prediction can be improved through a variety of different fusion strategies.However, previous work has typically focused on a specific fusion strategy applied to a specific dataset, limiting their comparability.
In this work, we integrate and extend research on information fusion for mental disorder detection by conducting extensive experiments with three types of deep learning-based fusion strategies: (i) feature-level fusion, where a pre-trained masked language model for mental health detection (Men-talRoBERTa, htt) was infused with a comprehensive set of engineered features, (ii) model fusion, where the MentalRoBERTa model was infused with hidden representations of other language models and (iii) task fusion, where a multi-task frame-work was leveraged to learn the features for auxiliary tasks.In addition to exploring the role of different fusion strategies, we expand on previous work by broadening the information infusion to include a second domain related to mental health, i.e. personality.We evaluate our model on data from two benchmark datasets, encompassing five mental health conditions: attention deficit hyperactivity disorder, anxiety, bipolar disorder, depression and psychological stress. 1he remainder of the paper is structured as follows: Section 2 presents a concise discussion of related work applying each of the three information fusion strategies.Section 3 introduces the datasets used to perform the mental health detection experiments.In Section 4, we describe our three mental status detection models that instantiate the three fusion strategies.Section 5 details the experimental setup including the specification of the fine-tuned MentalRoBERTa model baseline model.Section 6 presents and discusses the results of our experiments.Finally, we conclude with possible directions for future work in Section 7.

Related work
In this section we provide a concise discussion of selected works for each of the three fusion strategies.A comprehensive overview of work information fusion for mental illness detection from social media data has recently been provided by Zhang et al. (2023).One strand of recent work in the feature-level fusion approach is characterized by the integration of information from several groups of features extracted using NLP tools: Song et al. (2018) utilized a feature attention network (FAN) to combine indicators of mental disorders from four groups: (1) word-level features related to depressive symptoms taken from the Diagnostic and Statistical Manual of Mental Disorders (DSM-5, APA, 2013), (2) word-level sentiment scores of obtained from the SentiWordNet dictionary (Baccianella et al., 2010), (3) features related ruminative thinking, expressed as the amount of repetition of topics in a social media post (Nolen-Hoeksema et al., 2008) and (4) writing style features, measured in terms of the sequencing of part-of-speech in a social media.The FAN consists of four feature networks -one for each feature groups -fed into a post-level attention layer.The authors eval-uated the performance of their approach on the Reddit Self-reported Depression Diagnosis dataset (RSDD, Yates et al. (2017)), a large scale general forum dataset contaning data from 9,210 users with an average of 969 posts for each user.Their model was competitive with a convolutional neural network baseline model, despite using a much smaller number of posts in training data (only 500 posts per user).A second strand of feature-fusion approaches combines emotion features extracted using NLP tools with textual embeddings from pretrained language models, before feeding these into a CNN/LSTM structure to construct the MHC classification model.For example, Uban et al. (2021) used a hierarchical attention network with LSTM post-level and user-level encoders that combined multi-dimensional representations of texts.Specifically, their approach combined (i) content features, captured through word sequences encoded as 300-dimenional embeddings based on pre-trained GloVe vectors (Pennington et al., 2014), (ii) style features, expressed by numerical vectors representing stopword frequencies as bag-of-words, normalized by text lengths and usage of pronouns or other parts of speech, and (iii) emotion and sentiment features, represented by numerical vectors of word category ratios from two emotion-and sentimentrelated lexicons, LIWC (Pennebaker et al., 2001) and NRC emotion (Mohammad and Turney, 2013).They evaluated the model on the eRisk Reddit datasets on depression, anorexia and self-harm (Losada et al., 2019), reaching competitive result across all three mental disorders, outperforming a strong RoBERTa baseline model in the detection of two of them (self-harm and depression).
Turning to the model fusion approach, Sawhney et al. ( 2020) presented a time-aware transformer based model for the screening of suicidal risk on social media.Their model, called STATENet, uses a dual transformer-based architecture to learn the linguistic and emotional cues in tweets.STATENet combines the 768-dimensional encoding obtained from Sentence BERT, capturing the language cues of the tweet to be assessed, with an aggregate representation of the emotional spectrum, obtained from a pre-trained BERT model fine-tuned on the the Emonet dataset (Abdul-Mageed and Ungar, 2017).This second model, referred to as the Plutchik Transformer, tokenizes each post and adds the [CLS] token at the beginning of each post.The authors then express the the aggregate represen-tation of the emotional spectrum as the the final hidden state corresponding to this [CLS] token (768-dimensional encoding).They evaluated the STATENet models on the task of tweet-level prediction of suicide idation on the Twitter timeline dataset (Sinha et al., 2019), which contained 32,558 tweets.STATENet significantly outperforms competitive baselines models for suicidal risk assessment, demonstrating the utility of combining contextual linguistic and emotional cues for suicide risk assessment.
Recently, Turcan et al. (2021) explored the use of multi-task learning and emotion-infused language model finetuning for psychological stress detection.In this work, the authors introduced an innovative task fusion approach that utilized a multi-task learning setup to perform stress detection and emotion detection at the same time on the same input data.As currently available datasets for stress detection are not labeled for emotion, they first separately trained BERT models on different versions of the GoEmotions dataset (Demszky et al., 2020) and employed these to derive emotion labels for the stress detection dataset used in their experiments (Dreaddit, Turcan and McKeown, 2019).The authors then used these emotion labels as 'silver data' to train on them alongside stress in a multi-task learning setting with hard parameter sharing (Caruana, 1997).Their models achieved comparable performance to a state-of-the-art fine-tuned BERT baseline.Importantly, based on analyses designed to probe their models and discover what information they learn to use, the authors demonstrated that their task fusion approach improved the explainabilty of deep learning-sbased mental health prediction models.Specifically, by performing correlational analyses of the models predictions on each task, they were able to explore the usefulness of the emotion prediction layers in explaining stress classifications.
As can be seen from this overview, with the exception of Turcan et al. (2021), previous studies have focused on specific fusion strategies applied to a variety of mental health conditions.By applying different fusion strategies to five mental disorders (AHDH, anxiety, bipolar disorder, depression) and related symptomatology (psychological stress), we aim to facilitate the evaluation of current approaches to information fusion.

Data
Four datasets were used in the present work: The data used in the task of mental health detection were obtained from two publicly available social media datasets: (1) the Self-Reported Mental Health Diagnoses (SMHD) dataset (Cohan et al., 2018) and ( 2) the Dreaddit dataset (Turcan and McKeown, 2019).Both SMHD and Dreaddit were constructed from data from Reddit, a social media platform consisting of individual topic communities called subreddits, including those relevant to MHC detection.The statistics of these datasets are provided in Table 1.SMHD is a large dataset of social media posts from users with nine mental health conditions (MHC) corresponding to branches in the DSM-5, an authoritative taxonomy for psychiatric diagnoses (APA, 2013).User-level MHC labels were obtained through carefully designed distantly supervised labeling processes based on diagnosis pattern matching.The pattern matching leveraged a seed list of diagnosis keywords collected from the corresponding DSM-5 headings and extended by synonym mappings.To prevent that target labels can be easily inferred from the presence of MHC indicating words and phrases in the posts, all posts made to mental health-related subreddits or containing keywords related to a mental health condition were removed from the diagnosed users' data.
Dreaddit is a dataset of social media posts from subreddits in five domains that include stressful and non-stressful text.For a subset of 3.5k users employed in this paper, binary labels (+/-stressful) were obtained from crowdsourced annotations aggregated as the majority vote from five annotators for each data point.
As the SMHD and Dreaddit datasets are la-beled only with mental health status, two additional datasets were used to provide auxiliary information about personality and emotion.Following the approach used in Turcan et al. (2021), we first separately trained RoBERTa models on the GoEmotions dataset (Demszky et al., 2020) and the Kaggle MBTI dataset (Li et al., 2018) and used these models to predict emotion and personality labels for SMHD and Dreaddit.A table with dataset statistics for these resources is provided in the appendix.GoEmotions is the largest available manually annotated dataset for emotion prediction.It consists of 58 thousand Reddit comments, labeled by 80 human raters for 27 emotion categories plus a neutral category.The authors provided a mapping of these 27 categories to Ekman's six basic emotions (anger, disgust, fear, joy, sadness, and surprise), which are assumed to be physiologically distinct (Ekman, 1992(Ekman, , 1999)).Drawing on the results of experiments with different emotion mappings reported in Turcan et al. (2021), these six basic emotions are used in the present work.
The Kaggle MBTI dataset was collected through the PersonalityCafe forum2 and thus provides a diverse sample of people interacting in an informal online social environment.It consists of samples of social media interactions from 8675 users, all of whom indicated their Myers-Briggs Type Indicator (MBTI) personality type (Meyers et al., 1990).The MBTI is a widely administered questionnaire that describes personality in terms of 16 types that result from combining binary categories from four dimensions: (a) Extraversion/Introversion (E/I) -preference for how people direct and receive their energy, based on the external or internal world, (b) Sensing/Intuition (S/N) -preference for how people take in information, through the five senses or through interpretation and meanings, (c) Thinking/Feeling (T/F) -preference for how people make decisions, relying on logic or emotion over people and particular circumstances, and (d) Judgment/Perception (J/P) -how people deal with the world, by ordering it or remaining open to new information.

Data preprocessing
For the SMHD dataset, we removed all posts with a length greater than 512 words, as these posts could not be processed by the large pre-trained models like RoBERTa and its variants.We then randomly sampled one post from each user and focused our analysis on the four most frequently attested mental health conditions.Furthermore, all dtasets were subjected to various standard pre-processing steps, including removal of HTML, URLs, extra spaces and emojis in the text, and the correction of inconsistent punctuation.

Models
We experiment with seven information-infusion models that differ (i) in the type of information to be infused (personality, emotion, both) and (ii) the fusion strategy applied to incorporate that information into the mental health detection models.The architectures of these models is shown in Figure 1.

Feature-level fusion
Our feature fusion model combines a Mental-RoBERTa model (Ji et al., 2022) with a bidirectional long short-term (BiLSTM) network trained on 544 psycholinguistic features that fall into six broad categories: (1) features of morpho-syntactic complexity (N=19), (2) features of lexical richness, diversity and sophistication (N=52), (3) stylistic features (incl.register-based n-gram frequency features (N=57), (4) readability features (N=14), and (5) lexicon features designed to detect sentiment, emotion and/or affect (N=325).( 6) Cohesion and Coherence features (N=77).All measurements of these features were obtained using an automated text analysis system that employs a sliding window technique to compute sentence-level measurements.These measurements capture the within-text distributions of scores for a given psycholinguistic feature, referred to here as 'text contours' (for its recent applications, see e.g.Wiechmann et al. ( 2022) for predicting eye-movement patterns during reading and Kerz et al. (2022) for detection of Big Five personality traits and Myers-Briggs types).Tokenization, sentence splitting, part-ofspeech tagging, lemmatization and syntactic PCFG parsing were performed using Stanford CoreNLP (Manning et al., 2014).The given text is fed to a pre-trained language model and its output is passed through a BiLSTM layer with 2 layers and hidden size of 512.The second part of the model is the PsyLin model which is a 3-layer BiLSTM with hidden size of 1024 which is further passed through a fully connected layer to obtain a 256 dimensional vector.The input to this model is a set of over 600 handcrafted features across 5 categories.We constructed the feature-level fusion models by (1) obtaining a set of 256 dimensional vector from the BiLSTM network and then (2) concatenating these features along with the output from the Mental RoBERTa model component.This is then fed into a 2-layer feedforward classifier.To obtain the soft labels (probabilities that a text belongs to the corresponding emotion label), sigmoid was applied to each dimension of the output vector.

Model fusion
In our model fusion approach, the MentalRoBERTa model was infused with hidden features of a finetuned RoBERTa emotion model and fine-tuned RoBERTa personality model (see also Section 3).Both these models are fine-tuned 'roberta-base' models with a linear classification layer on top of them.We use the output values obtained from this layer to provide the infused model information on emotion and/or personality.Specifically, we pass the output obtained from the MentalRoBERTa through a sequential layer consisting of two linear layers and concatenate the features with the second part.We finally pass this through a linear layer to obtain the soft predictions for the respective MHC.Similar to the previous model types, we train separate models for all five MHCs.For each MHC, we created three different binary classification models: one with just emotions (MentalRoBERTa + Emotion), one with just personality (MentalRoBERTa + Personality), and one with 'full infusion' (Mental-RoBERTa + Emotion + Personality).

Task fusion
Our task fusion approach is an extended version of the multi-task learning setup used Turcan et al. (2021).Within this setup, we perform multiple tasks at the same time using the same input data.As the SMHD data is labeled only with MHC cate-   gories and Dreaddit only has labels for stress, we followed the approach described in Turcan et al. (2021) to derive emotion and personality labels for the two datasets.To this end, we first separately trained RoBERTa models on the GoEmotions and Kaggle MBTI datasets and use them to generate 'silver labels' for emotion and personality.The performance of these models is presented in Table 2.
We then trained the model in a multi-task setup on two tasks (mental health detection and emotion recognition or personality detection) or on all three tasks.In each task fusion model, the loss is the weighted sum of the loss from MHC part and secondary task part, where the weights are tunable Separate binary classification models were constructed for each of four self-reported diagnosed mental health conditions (MHC) from the SMHD dataset (ADHS, anxiety, depression, bipolar) and stress from the Dreaddit dataset.For each MHC we constructed an emotion-infused model (Men-talRoBERTa + Emotion), a personality-infused model (MentalRoBERTa + Personality), and a 'full-infusion' model (MentalRoBERTa + Emotion + Personality).

Baseline
We compared our models against a fine-tuned MentalRoBERTa model.We used the pretrained 'MentalRoBERTa-base' models from the Huggingface Transformers library (Wolf et al., 2019).The models consist of 12 Transformer layers with hidden size 768 and 12 attention heads.We run experiments with (1) a linear fully-connected layer for classification as well as with (2) an intermediate bidirectional LSTM layer with 256 hidden units.The following hyperparameters are used for finetuning: a fixed learning rate of 2 × 10 −5 is applied and L2 regularization of 1 × 10 −6 .All models were trained for 8 epochs, with batch size of 4 and maximum sequence length of 512 and dropout of 0.2.We report the results from the best performing models.

Training details
We trained all the models using BinaryCrossEntropy loss and Adam optimizer (adamw).We set the learning rate as 2e-5 and weight decay of 1e-5.We train the different models with different batch sizes.The BiLSTM network component of the feature fusion model had a batch size of 128 and for training all the other models we set a batch size of 32.We trained that component model for 200 epochs and all the other models for 8 epochs and saved the best preforming models on validation set.We evaluated these models on the test set and report the performance in terms of macro-F1 scores.
We selected the hyperparameters based on the the macro F1 score obtained on the the develop-ment set.We used grid search for getting the optimal values for the following: (1) for task fusion models: loss weights for primary and secondary tasks (0.5,0.5), (0.6, 0.4), (0.7,0.3) with the best f1 scores attained at equal weights for both tasks; (2) for the feature fusion model: hidden size 128, 256, 512, 1024, number of LSTM layers 1,2,3,4, dropout 0.2,0.4,we found the best performance with hidden size of 512, 3 layers and 0.2 dropout.

Results and discussion
Table 1 provides a concise overview of the performance in detecting five mental disorders (ADHD, anxiety, bipolar disorder, depression, and stress) for three fusion strategies (feature-level fusion, model fusion, and task fusion) in comparison to the baseline MentalRoBERTa model.In general, it is shown that our fusion-models outperform the Men-talRoBERTa baseline for three of the five mental health conditions (ADHD, anxiety, bipolar disorder), and performed similarly to the baseline model for depression and stress.For the the ADHD condition the best performing model, the 'Task Fusion -emotion' model, achieved an improvement of 4% F1 over the MentalRoBERTa baseline model.For anxiety and bipolar the best performance was achieved by the 'Task Fusionpersonality model', an improvement over the baseline of 2% F1.Overall, these results indicate that task fusion is the most effective fusion strategy for detecting these three mental health conditions.Task fusion models were able to learn the features for the auxiliary tasks (emotion classification and personality detection) and thereby improve the performance of the primary task (mental health detection) for three conditions.The results also suggest that both emotions and personality are important in the detection of specific mental health disorders: We observed that detection of ADHD benefited most from infusion of emotion information, whereas detection of anxiety and bipolar disorders benefited most from infusion of personality information.The finding that fusion model performed similarly to MentalRoBERTa baseline model for stress is consistent with the findings reported in Turcan et al (2021): Their emotion fusion models constructed for the task of binary stress prediction achieved comparable performances to a fine-tuned BERT baseline model (F1 BERT = 78.88,F1 Emotion fusion model with Ekman GoEmotions relabeling = 80.24).The F1 score of our baseline Mental-RoBERTa model was 3.3% higher than that of their baseline BERT model.For stress and depression, the best performance was obtained with the feature-level fusion approach, which yielded slight improvements over the MentalRoBERTa baseline.At the same time, we observed that infusing only information from the most informative source was more effective than full infusion, i.e. emotion and personality.A possible reason for this finding is noise or erroneous hidden features generated by the the auxiliary models in the case of model fusion (see Zhang et al., 2023;Pan and Yang, 2010, for discussion).A potential reason for lower performance of the full infusion models in the task learning approach is competition among the auxiliary tasks with regard to providing evidence for the relevance of particular features (see Ruder, 2017, for a discussion of 'attention focusing' in multi task learning).We intend to explore these issues in future research.
Building upon the approach described in Turcan et al (2021), we go a step further to probe our full task fusion models and discover the exact nature of the information it learned to use, i.e. how the six basic emotion categories (anger, disgust, fear, joy, sadness, and surprise) and four personality dimensions (Extraversion/Introversion (E/I), Sensing/Intuition (S/N), Thinking/Feeling (T/F) and Judgment/Perception (J/P)) guided the prediction of mental health status.To this end, we calculated Pearson correlation coefficients between the predicted probabilities for each of the five mental health conditions and the probabilities for the four personality and six emotion categories.Table 4 presents an overview of the results of this analysis.A visualization of the results can be found in Figure 2 in the appendix.The results revealed that the full task fusion model learned moderate to strong correlations between specific mental health statuses and specific emotion and personality categories: More specifically, the ADHD condition was strongly associated with sadness and disgust and moderately associated with anger and anxiety, whereas it was strongly negatively correlated with joy.Anxiety was strongly linked to joy and moderately associated with sadness, while being strongly negatively correlated with disgust.Bipolar disorder is characterized by strong negative associations with fear and disgust, with tendencies towards anger and sadness.Depression was strongly linked to the negative emotions of fear, anger, disgust and sad- ness.In addition -like anxiety -it was positively related to with joy, which is somewhat unexpected.Stress exhibited the weakest correlations to emotional categories with moderate positive correlations with fear and negative ones with joy being the most salient.We note that the weaker correlations between stress and the emotional categories can explain the more modest gain in predictive accuracy of the fusion models compared to the fine-tuned transformer model in both the present study and in Turcan et al. (2021).Turning to personality, the task fusion model learned that all mental health conditions are associated with the MBTI-T dimension, such that individuals with a preference for relying on emotions in decision making are more likely to have an MHC diagnosis.Bipolar depression, ADHD and anxiety were also associated with the MBTI-J dimension, such that individuals that are less open to new information are more likely to exhibit any of these MHCs.Anxiety and bipolar disorder were correlated with the MBTI-E dimension, such that these conditions were more likely for individuals with a preference for focusing on the future with an emphasis on patterns and possibilities.Anxiety was also strongly negatively correlated with the MBTI-N dimension, meaning that the condition was much more prevalent in introverted individuals, than in extraverted ones.At the same time, extraversion was associated with both depression and to a lesser extent with bipolar disorder.
In line with results from experimental and genome-wide association studies of mental health and personality (Adams et al., 2019;Nikolic et al., 2020), these results suggest that personality dimensions are important in understanding vulnerability to mental health disorders.

Conclusion
In this work, we presented the first comprehensive experimental evaluation of current deep learningbased fusion strategies (feature-level fusion, model fusion, task fusion) for the detection of mental disorders.We go beyond previous work by applying these approaches to five mental health conditions.The results of our experiments showed that the task fusion strategy is most promising for the detection of three of the five conditions (ADHD, anxiety, and bipolar disorder), while feature-level fusion is most advantageous for the detection of psychological distress and depression.We demonstrated that the prediction of mental health from textual data benefits from the infusion of two information sources related to mental disorders, i.e. emotion and personality.Furthermore, we show that information fusion models can improve the classification accuracy of strong transformer-based prediction models while enhancing their explainability.
In this paper, we focused on developing binary classifiers that aim to distinguish between individuals with a particular mental illness and control users.In future work, we intend to addresses the more complex problem of distinguishing between multiple mental health conditions, which is essential if we are to uncover the subtle differences among the statistical patterns of language use associated with particular disorders.We further intend to employ our approach to longitudinal data to gain valuable insights into the evolution of symptoms over time and extend it to languages beyond English, specifically German.

Limitations
We note that the datasets used in this work solely represent social media interactions from Reddit, which is known to have a demographic bias toward young, white, American males3 .Furthermore, systematic, spurious differences between diagnosed and control users can prevent trained models from generalizing to other data.Future research on other social media and datasets is needed to determine to what extent the presented findings are generalizable to broader populations.

Figure 1 :
Figure 1: Information fusion architectures used in our experiments

Figure 2 :
Figure2: Pearson correlations between predicted values on the primary task (mental health prediction) and each category of the secondary tasks (emotion prediction and personality recognition)

Table 1 :
Count of posts, tokens and characters along with average post length for diagnosed and control users.*NOTE: In all binary classification tasks, the control set consisted of a randomly drawn subset of control users that matched the size of the respective positive class.

Table 2 :
Performance of auxiliary models used to generate 'silver labels' for emotion and personality

Table 3 :
Results of information-fusion models in comparison to baseline models.F1 scores averaged over two runs

Table 4 :
Cohen (1988)elations between predicted values on the primary task (mental health prediction) and each category of the secondary tasks (emotion prediction and personality recognition)Note: FollowingCohen (1988), we consider correlation coefficients with absolute values greater than 0.3 to be 'moderate' and greater than 0.5 to be 'strong'.