Learning to Answer Psychological Questionnaire for Personality Detection

Existing text-based personality detection research mostly relies on data-driven approaches to implicitly capture personality cues in online posts, lacking the guidance of psychological knowledge. Psychological questionnaire, which contains a series of dedicated questions highly related to personality traits, plays a critical role in self-report personality assessment. We argue that the posts created by a user contain critical contents that could help answer the questions in a questionnaire, resulting in an assessment of his personality by linking the texts and the questionnaire. To this end, we propose a new model named Psychological Questionnaire enhanced Network (PQ-Net) to guide personality detection by tracking critical information in texts with a questionnaire. Specifically, PQ-Net contains two streams: a context stream to encode each piece of text into a contextual text representation, and a questionnaire stream to capture relevant information in the contextual text representation to generate potential answer representations for a questionnaire. The potential answer representations are used to enhance the contextual text representation and to benefit personality prediction. Experimental results on two datasets demonstrate the superiority of PQ-Net in capturing useful cues from the posts for personality detection.


Introduction
As a psychological conception, personality aims to explain human behaviors in terms of a few stable and measurable individual characteristics (Vinciarelli and Mohammadi, 2014). The study of personality is fundamental to psychology, and personality detection (Xue et al., 2018) has benefited many applications such as dialogue systems (Zheng et al., 2019), recommendation systems (Yang and Huang, 2019), and suicide risk assessment (Matero et al., 2019). Canonical approaches to personality test are * Corresponding author.
 Identified with her in the situation she opened up about.  I made it a while ago, but I think it was something like 2 parts cornstarch to 1 part cocoa powder, and I put cinnamon in there for color.
◼ I have always been very reserved, and I do need time alone to gain...
◼ I rarely wear heels for two reasons: they're uncomfortable, and I'm just about six feet tall. Heels make me stand out more...
◼ It's better to be single than to commit yourself to someone who doesn't get you, won't accept you, doesn't love you, makes you feel worthless. Be single. Get healthy. Shut...
 YES! She runs even further with the metaphor!! :laughing: I do love cake... metaphorically and literally. Now I'm craving cake.
Are you usually a good mixer with groups of people or rather quiet and reserved？ □ Quiet and reserved. □ A good mixer.
Questionnaire about Introversion vs. Extroversion (I vs. E ) Posts … Figure 1: An example to show that certain online contents (highlighted) created by user can be used to answer the questions of a questionnaire. Highlighted contents strongly indicate that the right choice is 'Quiet and reserved'.
generally based on questionnaires elaborately designed by psychologists, yet the cost and scalability issues make them less practical in cyberspace (Nie et al., 2014;Aung and Myint, 2019). Recent years have witnessed an increasing interest in automatically identifying one's personality traits based on her/his social media posts (Sorokowska et al., 2016;Imran et al., 2018;Tadesse et al., 2018b;Dandannavar et al., 2020). To encode the input posts and obtain their context representations, most of these methods employ deep learning models such as LSTMs (Tandera et al., 2017), CNNs (Xue et al., 2018) and pre-trained language models (PTMs) (Jiang et al., 2020;Yang et al., 2021a,b). They generally rely on the models to capture potential personality cues implicitly from the texts in a data-driven manner, without any guidance of psychological domain knowledge. As a result, the performance of these models is largely limited by the availability of training data and the learning capability of models.
We observe from real data that the posts created by a user contain some critical contents that could help answer the questions in a questionnaire. As the example shows in Figure 1, there are a set of posts from a user and a question "Are you usually a good mixer with groups of people or rather quiet and reserved?" from an MBTI (Briggs and Myers, 1977) questionnaire. The question is also associated with two choices "Quite and reserved." and "A good mixer.", which are intended to investigate whether the user's personality trait is introversive (the former) or extroversive (the latter). From the posts, we can see that the contents "always been very reserved", "need time alone" and "better to be single" strongly indicate that the user's personality trait is introversive. Therefore, we argue that it is possible to utilize the questionnaire, which contains questions that are highly related to personality traits, to guide a model to capture critical information in the posts for personality detection. For this purpose, we propose a new model named Psychological Questionnaire enhanced Network (PQ-Net) for text-based personality detection. Specifically, PQ-Net consists of two streams: a context stream and a questionnaire stream. For the context stream, a PTM-based encoder is employed to encode each post and create its contextual representation. For the questionnaire stream, it first encodes each question by a question encoder and each candidate answer by a choice encoder, and then employs a cross-attention mechanism with supervision to enable the model to learn a potential answer representation for each question by choosing the correct answer based on the post representations. We then concatenate the post representations and the potential answer representations to predict the user's personality traits. Under the guidance of the questionnaire, our PQ-Net is able to capture personality-related cues from the posts in an explicit manner rather than learning them implicitly.
Extensive experiments on the Kaggle and Pandora datasets show that PQ-Net consistently outperforms existing competitors with superior performance. Further analyses also demonstrate that the questionnaire and the two-streams structure all play a crucial role in PQ-Net, and that the user representations enhanced by PQ-Net are more inductive and distinguishable in comparison to the baselines. Lastly, we show that the cues obtained by PQ-Net are more interpretable for personality detection.
The contributions of this paper are threefold: • This is the first work to introduce a traditional psychological questionnaire into automatic personality detection, offering a new perspective of utilizing psychological knowledge.
• We propose a novel model to track critical information in posts with a questionnaire and provides an explicit way of identifying relevant cues in posts for personality detection.
• We demonstrate on two datasets that PQ-Net can effectively capture personality-relevant cues in posts and yield superior performance.

Task Definition
The personality detection task studied in this paper can be formally defined as follows. Given represents the i-th post with l p words. Consider an extra personality-related psychological questionnaire Q= q j , {c j,k } r k=1 m j=1 with m questions, where each question q j = w q j,1 , w q j,2 , . . . , w q j,lq is associated with r choices {c j,k } r k=1 . We use c j,k = w c j,k,1 , w c j,k,2 , . . . , w c j,k,lc to represent the k-th choice for question q j . The objective of this task is to predict the personality traits Y = y (1) , y (2) , ..., y (T ) of the user along T dimensions based on posts P and questionnaire Q.

Architecture
The overall architecture of our PQ-Net is demonstrated in Figure 2, which mainly comprises two streams: a context stream and a questionnaire stream. The context stream aims to encode each post by a post encoder to obtain its contextual representation (i.e., implicit cues). The questionnaire stream first encodes each question by a question encoder and each choice by a choice encoder. Then, it performs cross attention to capture key information in the contextual representations that can help "answer" the questions of the questionnaire, resulting in a potential answer representation for each question (i.e., explicit cues). The potential representations for all the questions are split into different categories according to their correspondences with the personality traits. Finally, the averaged contextual representation and the averaged answer representation in each category are concatenated as the enhanced representation to predict each personality p n h [CLS] [SEP]  trait. In the following subsections, we introduce these two streams in detail, respectively.

Context Stream
As shown in the left part of Figure  where CLS and SEP are special tokens which represent the start and end of an input sequence, respectively. BERT p (·) denotes the final hidden state of the CLS token of BERT, which is commonly used as the abstract representation of a sequence. We apply the post encoder to encode each post and correspondingly obtain a set of contextual representations h p = [h p 1 , h p 2 , · · · , h p n ] ∈ R n×d , where d is the dimension of each representation.

Questionnaire Stream
As shown in the right part of Figure 2, we first encode each question via a question encoder and its choices via a choice encoder to obtain their contextual representations. In this study, the question encoder and the choice encoder are allowed to share the same pre-trained BERT parameters by considering their relatedness. Formally, similar to Eq. (1), we obtain the abstract representations of the j-th question h q j and its corresponding k-th choice h c j,k as in Eq.
h c j,k =BERT q CLS, w c j,k,1 , ..., w c j,k,lc , SEP , (3) We then apply a cross-attention mechanism (Vaswani et al., 2017) to capture critical information in the post representations by trying to "answer" the questions in the questionnaire. Specifically, the j-th question representation h q j is used as the query and the post representations h p are used as the key and value. Then, the question-aware post representation z j for the j-th question is obtained by: where S is the number of attention heads, d k = d S is the hidden size of each head, and σ is the softmax function. W Q s ∈ R d×d k , W K s ∈ R d×d k and W V s ∈ R d×d k are the s-th head linear transformations for query, key and value, respectively. 1 √ d k is the scaling factor of attention weights, and || is the concatenation operation along each head. Eq. (4) demonstrates how we track question-relevant information from the posts one by one and aggregate them through the attention weights. Then, z j ∈ R 1×d is used to predict the possibility of each choice being the answer of the j-th question: where W G ∈ R d×r and b G ∈ R 1×r are the learnable parameters of an affine transformation that converts z j into r dimensions. Note that each question in the questionnaire typically focuses on a certain personality trait, and thus its choices directly reflect the tendency of this personality trait. In other words, the preferred choice of each question can be inferred from the user's personality traits during training, which provides additional supervision signals for the model to predict the choices correctly: whereĝ j is the preferred answer of the j-th question. Once we obtain the possibility of each choice, we then calculate the potential answer representation of the j-th question in a soft approach as follows: As a result, we obtain all the potential answer rep-

Classification & Objective
Since each question in the questionnaire focuses on a specific personality trait, we divide the potential answer representations h a into T groups according to their correspondences with the personality traits. Formally, the t-th trait-specific answer representations are represented as follow: For each personality trait, the averaged post representation and the average trait-specific answer representation are concatenated to produce the final representation u (t) : Then, T softmax-normalized linear transformations are used to predict the probability on each personality trait, respectively. Concretely, for the t-th trait, we calculate: where u is the bias term. The objective function of personality detection is defined as follows: where y (t) denotes the true label of the t-th trait. Finally, the tasks of questionnaire answering and personality detection are jointly trained with their functions linearly combined as follow: where λ ∈ (0, 1) is a tunable coefficient.

Experiments
In this section, we first introduce the details of the personality benchmarks, questionnaire and baseline models adopted in our study, and then report and discuss our experimental results.

Datasets
Big Five and MBTI are two widely used personality frameworks in the fields of computational linguistics and natural language processing (Stajner and Yenikent, 2020 Table 1. Since the distribution of labels is imbalanced, we use the Macro-F1 metric for a more accurate evaluation.

Questionnaire
As shown in Appendix A, 26 personality-related questions are defined in the questionnaire 5 , each focusing on one of four MBTI personality traits with two choices reflecting the tendency of this personality trait. For example, the question "Are you usually a good mixer with groups of people or rather quiet and reserved?" focuses on the I/E trait, and the choices "Quiet and reserved." and "A good mixer." correspond to the I and E categories in this trait, respectively. Based on the ground truth personality labels, we can easily infer the preferred answer to each question for a user, which is treated as an extra supervision signal for training PQ-Net in Eq. (6). The exact number of questions for the I/E, S/N, T/F and P/J traits are 8, 7, 3 and 8, respectively.

Baselines
To make a comprehensive evaluation of our model, we employ the following models as baselines:

Implementation Details
We use Pytorch (Paszke et al., 2019) to implement our PQ-Net on four 2080Ti GPU cards. As the previous study (Hernandez and Knight, 2017), we set the maximum number of posts per user to 50 and the maximum length of each post to 70. For the questionnaire, we set the maximum length to 43/21 for each question/choice. For BERT, we use the bert-base-uncased (Devlin et al., 2019) to initialize. For training, we use the Adam (Kingma and Ba, 2014) optimizer with a mini-batch size of 32, a dropout rate of 0.2, and a learning rate of 2e-5/1e-3 for pre-trained/non-pretrained modules. For the coefficient λ in Eq. (12), we search it in the range of 0.1 to 0.9 with a step of 0.1 and set it to 0.7 eventually. During training, we use the early-stopping strategy for 5 consecutive epochs on validation set and report the final performance on test set.

Overall Results
The overall results are shown in Table 2, from which three observations can be noted. First, our PQ-Net consistently outperforms the other models on the two benchmarks. Particularly, on Kaggle, PQ-Net outperforms the latest state-of-the-art model (SN-Attn) and the basic pre-trained encoder (BERT fine-tune ) they relies on by 3.93 and 5.08 in

Ablation Study
The overall results above have demonstrated the effectiveness of our PQ-Net model as a whole. To further study the impact of each key module, we conduct an ablation study by removing them in turn from PQ-Net. The results, which are organized into three groups, are shown in Table 3. In the paragraphs below we only provide a detailed analysis on the Kaggle dataset, while similar conclusions can be obtained from the other dataset. First, we investigate the contributions of the questions and choices in the questionnaire. When removing the question representation h q in Eq. (4) and replacing the cross attention with saliency attention (Vu et al., 2020) without query, the performance declines by 2.14, demonstrating that the questions are helpful for retrieving personalityrelated cues. On the other hand, when removing the trait-specific potential answer representation h a(t) in Eq. (9) and replacing it with z in Eq. (4), the performance declines by 2.69, showing that explicitly exploiting the user-agnostic choices in the questionnaire is more helpful than using only the information retrieved from the posts.
Second, we investigate the contributions of the soft gate and its supervision role. When replacing the soft gate with a hard gate, the performance declines by 2.34. This is most probably because the soft gate is smoother than the hard one. Besides, when removing J q from Eq. (12), the performance declines by 2.58, showing that this extra supervision is worth considering for training PQ-Net.
Third, we investigate the contributions of the cross attention and two streams. When replacing the cross attention with cosine similarity, the performance declines by 1.59, suggesting that the attention mechanism is more capable of aligning the heterogeneous spaces of questionnaire and posts. Besides, when removing the questionnaire stream but keeping the additional supervision by directly using the post representations h p in Eq. (1) to predict the choices g in Eq. (5), the performance drops by 2.91. When removing the context stream and using only the potential answer representation h a(t) as the final representation u (t) in Eq. (9), the performance drops by 1.28. This demonstrates that the design of the two-streams structure in PQ-Net is more favorable to capture the personality cues. Finally, after making the two streams share one BERT encoder, the performance only declines by 0.67, showing the staggering encoding capability of BERT.

Correlation Analysis
Intuitively, the performance of our model can be relevant to the accuracy of correctly "answering" the questions in the questionnaire. To qualitatively show this, we analyze the Macro-F1 scores of PQ-Net according to different numbers of correctly predicted questions. For this purpose, we first group the users in the Kaggle's test set according to the numbers of correctly answered questions by PQ-Net, and then record the Macro-F1 scores in each group. The results in the four personality dimensions are plotted in Figure 3, where the x-axis represents the number of correctly predicted questions, and the y-axis is the Macro-F1 score. We can see that as the number of correctly predicted questions grows, the performance of PQ-Net increases accordingly. In particular, for all the personality traits, the performance of PQ-Net almost reaches 100% when all the questions are correctly predicted.
For comparison, we plot the results of two baselines for each group of users (BERT-part) and for all the users (BERT-all) respectively in Figure 3. One notable phenomenon is that PQ-Net suffers more from those groups with few questions predicted correctly than its counterparts. In the most extreme case, the performance of PQ-Net drops to 0% when all the questions are incorrectly answered. The reason behind this could be that PQ-Net overly relies on information from the questionnaire stream, so when the answers are incorrect, the information becomes misleading and deleterious.
Besides, we also plot the results of correlation analysis on the Pandora dataset in Figure 4, from which a similar trend can be observed as on Kaggle.

Visualization Analysis
The experimental results above have demonstrated the numerical performance of PQ-Net in detail. To further show whether the user representations enhanced by PQ-Net are inductive and distinguishable, we employ t-SNE (Laurens and Hinton, 2008) to reduce the dimension of the learned representations in the T/F trait of Kaggle's test set to 2 and visualize the effect. As the results in Figure  5 show, PQ-Net visibly enforces a more compact clustering of examples in accordance with the personality labels than BERT, which benefits person- ality classification accordingly. This experiment vividly demonstrates the superiority of our model in personality detection.

Case Study
In this subsection, we conduct the case study to further analyze our PQ-Net with a real example. As shown in Figure 6, we first record the probability of each choice of a question predicted by PQ-Net, and then plot the cross-attention weights from question to posts to show clues the model has discovered in the posts to support the decision. This experiment demonstrates that PQ-Net is able to judge the choices of question in the questionnaire via retrieving corresponding clues in the posts, providing an interpretability for personality detection.

Related Work
In recent years, numerous efforts have been devoted to automatically detecting one's personality from his/her online texts (Adamopoulos et al., 2018;Tareaf et al., 2018;Guan et al., 2020). The early works rely on hand-crafted features (Yarkoni, 2010;Schwartz et al., 2013;Cui and Qi, 2017;Amirhosseini and Kazemian, 2020), which include various psycholinguistic features extracted by LIWC and statistical features extracted by bag-of-words models (Zhang et al., 2010). Nevertheless, feature engineering-based methods are limited by their ca-  Figure 6: Results of case study in the I/E trait of Kaggle's test set, where E is the ground truth personality label. At the top, the question asks about friendship with two choices corresponding to the Introversion and Extroversion personality, respectively. According to the prediction by PQ-Net, the probability of the user choosing "Deep friendship with very few people" and "Broad friendships with many different people" are 0.4 and 0.6, respectively. In the middle, the cross-attention weights from question to posts are shown. At the bottom, we show the contents of 5 posts with the highest attention weights and highlight the interpretable and related clues, e.g., "I like to meet other people and dedicate most of my time for these new people" in Post-21 is relevant to the questionnaire evidently.
pability in extracting many useful implicit features (Xue et al., 2018;Lynn et al., 2020). Meanwhile, deep neural networks have been applied to personality detection by implicitly extracting features from the texts (Pradhan et al., 2020).

Conclusion
In this paper, we proposed a psychological questionnaire enhanced network (PQ-Net) for personality detection. PQ-Net aims to track personalityrelated cues from online posts in an explicit manner by considering the connections between the posts and a psychological questionnaire. Specifically, PQ-Net comprises a context stream and a questionnaire stream. The former encodes each post to obtain its contextual representation, and the latter learns to capture critical information in the posts to result in a potential answer representation for each question in the questionnaire. Finally, the potential answer representations are used to enhance the contextual post representations to predict the personality traits. Experimental results on two benchmarks show that PQ-Net outperforms the baselines significantly. Besides, further studies and analyses demonstrate that the representations enhanced by PQ-Net are more inductive and distinguishable, providing an interpretability for the personality detection process.

Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 62176270) and the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (No. 2017ZT07X355).

Ethical Statement
This work aims to provide an innovative method to inspire research rather than creating a tool that violates privacy. The Kaggle and Pandora datasets used in our study are all anonymized, and we strictly abide by the use specifications of research data and have not illegally detected any user privacy information. We realize that the evaluation results in this paper may have certain ethical risks and thus require that they should be used in a strictly controlled manner, subject to the approval of the institutional review board. Anyone who uses our work to secretly infer people's personality characteristics is strictly prohibited.

A Questionnaire
The detailed content of the questionnaire adopted in our study is shown in Figure 7, in which the questions, including choices, are divided into four categories, each corresponding to one of the four traits of MBTI personality. Besides, we also mark the personality tendency represented by each choice.
Are you usually a good mixer with groups of people or rather quiet and reserved?

(a) Introversion or Extroversion
If you were a teacher, would you rather teach facts-based courses or courses involving opinion or theory? Teach facts-based courses. Sensing Courses involving opinion or theory.
iNtuition In doing something that many other people do would you rather invent a way of your own, or do it in the accepted way?
Do it in the accepted way. Invent a way of your own.
Do you usually get along better with realistic people or imaginative people?
Realistic people. Imaginative people.
In reading for pleasure, do you enjoy odd or original ways of saying things, or like writers to say exactly what they mean? Like writers to say exactly what they mean. Enjoy odd or original ways of saying things.
Would you rather have as a friend someone who is always coming up with new ideas or someone who has both feet on the ground?
Someone who is always coming up with new ideas.
Someone who has both feet on the ground.
Do you admire more the people who are normal-acting to never make themselves the center of attention or too original and individual to care whether they are the center of attention or not? normal-acting to never make themselves the center of attention. Individual to care whether they are the center of attention or not.
Would you rather be considered a practical person, or An out-of-the-box-thinking person? A practical person. An out-of-the-box-thinking person.

Sensing iNtuition
Sensing iNtuition

Sensing iNtuition
Sensing iNtuition

Sensing iNtuition
Sensing iNtuition

(b) Sensing or iNtuition
Do you more often let your heart rule your head or your head rule your heart? Your head rule your heart. Thinking Your heart rule your head.
Feeling Is it a higher compliment to be called a person of real feeling or a consistently reasonable person?
A consistently reasonable person. A person of real feeling.
Do you usually value emotion more than logic or value logic more than feelings?
Value logic more than feelings. Value emotion more than logic.

Thinking Feeling
Thinking Feeling

(c) Think or Feeling
When you go somewhere for the day, would you rather plan what you will do and when or just go!! Just go!! Judging Plan what you will do and When.

Perception
Does the idea of making a list of what you should get done over a weekend help you, or stress you, or positively depress you?
Help you.
Stress you or positively depress you.
When you have a special job to do, do you like to organize it carefully before you start or find out what is necessary as you go along?
Organize it carefully before you start.
Find out what is necessary as you go along.
Do you prefer to arrange picnics, parties etc, well in advance, or be free to do whatever to looks like fun when the time comes?
Stand back and listen first.
Be free to do whatever to looks like fun when the time comes.
When it is settled well in advance that you will do a certain thing at a certain time, do you find it nice to be able to plan accordingly or a little unpleasant to be tied down?
Nice to be able to plan accordingly.
A little unpleasant to be tied down.
Are you more successful at following a carefully worked out plan, or at dealing with the unexpected and seeing quickly what should be done. At dealing with the unexpected and seeing quickly what should be done. At following a carefully worked out plan.
Does following a schedule appeal to you, or cramp you?
Appeal to you.
Cramp you.
In your daily work, do you…… Usually plan your work so you won't need to work under pressure and hate to work under pressure.
Rather enjoy an emergency that makes you work against time.