Prompt-based Pre-trained Model for Personality and Interpersonal Reactivity Prediction

This paper describes the LingJing team’s method to the Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) 2022 shared task on Personality Prediction (PER) and Reactivity Index Prediction (IRI). In this paper, we adopt the prompt-based method with the pre-trained language model to accomplish these tasks. Specifically, the prompt is designed to provide knowledge of the extra personalized information for enhancing the pre-trained model. Data augmentation and model ensemble are adopted for obtaining better results. Extensive experiments are performed, which shows the effectiveness of the proposed method. On the final submission, our system achieves a Pearson Correlation Coefficient of 0.2301 and 0.2546 on Track 3 and Track 4 respectively. We ranked 1-st on both sub-tasks.


Introduction
Personality can be defined as a set of characteristics (e.g., age, income, and hobby), which can reflect the differences of individuals in thinking, emotions, and behaviours (Vora et al., 2020).The power of personality is worth exploring and pervades human lives everywhere (Beck and Jackson, 2022).Personality prediction is an interdisciplinary field spanning from psychology to computer science.However, people's personalities can't be directly observed and measured, but are expressed in activity patterns, and thus can be inferred in that way.Humans tend to covey their personalities through language because it is the most prominent way in such an Internet society.Meanwhile, written text is one of the most important appearances of language.
Consequently, the involvement of machinelearning-based methods in predicting the personality of individuals seems necessary.Over the past 20 years, much progress has been made in natural lan-guage processing (NLP), which is faced with a revolution.Especially, with the development of deep learning and transfer learning, automatically and accurately predicting the personality is an emerging topic in NLP.Classical text representation methods and pre-trained word representation approach both make the personality prediction research area more attractive and competitive.Even though how to compute and predict someone's personality based on texts is still an open question, attracting more and more researchers to focus on it.
The approaches of personality prediction have a long research history.Early in 2008, the Personae corpus (Luyckx and Daelemans, 2008) has already been proposed for predicting the personality from the text.The corpus is used for predicting the writer's personality traits that are reflected in writing style.Input representation is one of the most components in NLP.In recent years, the novel representation method has been the pretrained word embeddings.Therefore, we mainly focus on the personality prediction approaches with the pre-trained word embeddings.Methods with the pre-trained embeddings are firstly based on Word2Vec (Mikolov et al., 2013b,a) and GloVe (Pennington et al., 2014).Surprisingly, Poria et al. (2013) propose to extract common sense knowledge and affective information to recognize personality from text, where they represent information in a directed graph.Despite the success of early word embedding models (e.g., Word2Vec), there are still practical problems.The first one is that previously unseen words make the model get into trouble.The second one is the overwhelmingly large parameters for a model to learn.Liu et al. (2016) propose a recurrent and compositional deep-learning-based model to address these issues.
Very recently, researchers start to explore large pre-trained models for NLP (Jawahar et al., 2019;He et al., 2020;Malkin et al., 2021).Kazameini et al. (2020) use the BERT language model (Devlin et al., 2018) to extract contextualized word embeddings and achieve state-of-the-art performance for personality detection.Similarly, Transformer-MD (Yang et al., 2021) is proposed to put multiple posts together for representing the personality of each user.The context embeddings learned by large pre-trained models can effectively improve the performances and have theoretical advantages over traditional embedding methods.Therefore, we also adopt a pre-trained model named DeBERTa (He et al., 2020) to construct our personality prediction model.In this paper, we present our Prompt-based pre-trained network for personality prediction at Track 3 and Track 4 of WASSA-2022.
Track 3: Personality Prediction (PER), which consists in predicting the personality of the essay writer, knowing all his/her essays and the news article from which they reacted.
Track 4: Interpersonal Reactivity Index Prediction (IRI), which consists in predicting the personality of the essay writer, knowing all his/her essays and the news article from which they reacted.

Main Method
In this section, we will elaborate on our method in detail, including the model architecture, prompt design, regression optimization.

Model Architecture
The overall architecture of our method is shown in Figure 1, the DeBERTa pre-trained language model (He et al., 2020) is adopted as our backbone for personality and interpersonal reactivity prediction.We first input the text with a manually designed prompt to be tokenized with DeBERTa tokenizer.
Then the input shall be encoded with the encoder through the self-attention (Vaswani et al., 2017) mechanism.Finally, the output is produced by the Linear layer for regression.

Prompt Design
Prompt Learning is considered to be the wise way for providing the pre-trained model with extra knowledge (Liu et al., 2021).For this reason, we manually design the prompt to extract relevant knowledge from the pre-trained model for personality prediction, which is presented as the fixed template, i.e., "A female, with fourth grade education, third race, 22 and income of 100000".Specifically, this persona information is mapped into the tokens for providing more semantic information for the next regression task.We concatenate these fixed prompts with the origin input together for learning the joint representation in the pre-trained language model.

Regression Optimization
The personality and interpersonal reactivity prediction task is designed to regress the probable logits of different personality items.
Given the training samples D = C 1 , X 1 , C 2 , X 2 , . . ., C D , X D , where the C i , i = {1, . . ., n}, represents the author persona information with the corresponding stories collections X i .
The author persona information contains different other information items c, i.e., education and race, etc.We want to concatenate these texts together with the prompt learning, which aims to provide extra information for the personality prediction.The optimization function used the MSE function, which is shown as (1): (1) where the Logits represents the logits output of prompt tuning from the pre-trained model, and y j is one author's personality item, j ∈ [1, D].
We implement the above function with the optimization of the following equation: where the y k represents each item personality label, and θ is the parameters of the pre-trained model.

Data Augmentation
Inspired by the work (Karimi et al., 2021), we consider the data augmentation with random punctuation marks, i.e., six punctuation marks in {".", ";", "?", ":", "!", ","}.The reason is that we want to ensure there is at least one inserted mark for more data from one author.Meanwhile, we do not want to insert too many punctuation marks as too much noise might hurt the model, especially for the personality prediction.

Model Ensemble
For different pre-trained models, the wise choice to improve the final results is to ensemble the pretrained model (Zwanzig, 1960).As a result, we adopt the ensemble method to average the logits for the final prediction.Specifically, we implement the Bagging algorithm (Inoue and Kilian, 2008) for the personality and interpersonal reactivity prediction, which can effectively reduce the variance of the logits prediction by averaging the prediction bias produced from different models.

Experimental Setting
This section will subsequently present the emotion dataset, our experimental models, experimental settings, control of variables experiment.

Dataset
The shared task organizers supplied an extended dataset to participants used by (Buechel et al., 2018) 1 .The dataset includes essays between 300 and 800 characters with the length, Batson empathy, Personal Distress Scale, and other additional demographic and personality information.Among them, the person-level demographic information mainly contains age, gender, ethnicity, income and education level.The provided dataset of WASSA shared work contains 1860 training samples.

Implementation Details
In these tasks, we are mainly based on the hugging face framework2 (Wolf et al., 2020).We add a randomly initialized linear layer after DeBERTa (He et al., 2021) to output the value of shape = [1].
We use the AdamW (Loshchilov and Hutter, 2018) optimizer and the learning rate is set to 8e-6 with the warm-up (He et al., 2016).The batch size is 12.We set the maximum length of 512, and delete the excess.Linear decay of learning rate and gradient clipping of 1e-4.Dropout (Srivastava et al., 2014) of 0.1 is applied to prevent overfitting.We implemented the code of training and reasoning based on PyTorch3 (Paszke et al., 2019)

Result and Discussion
This section describes our experiment results on Personality Prediction (PER) sub-task and the Interpersonal Reactivity Index Prediction (IRI) sub-task.Experiments were conducted with the development set as our test dataset, and the experimental results are shown in Table 1 and Table 2.The results of the final submissions are shown in Table 3.

Results for Track3
The results in Table 1 show the performance of our method with different components on the developments dataset.From Table 1, we can find that: The method without ensemble component achieves 0.25788 AVG.Compared to the method without ensemble components, the combination of random punctuation and ensemble gains 0.0144 increase of AVG.The prompt and ensemble components are used in our method, which gains 0.02824 increase of AVG than W/o prompt.This shows that each component can improve the performance of our method.When the three components are used together, our model achieves state-of-the-art performance with 0.31212 AVG.On the final submis-  sions, our method achieves a Pearson Correlation of 0.23006, which ranks Top-1 in PER (Track3) sub-task.

Results for Track4
The results in Table 1 show the impact of each component for our approach on the development dataset.From Table 2, we can find that: Our method achieves 0.2791 AVG.Compared to the method without random punctuation, our method gains 0.0137 increase in AVG.The method without ensemble component only achieves 0.2408 AVG.These demonstrate the availability of introducing each component to our method.On the final submissions, our method achieves a Pearson Correlation of 0.25460, which ranks TOP-1 in IRI (Track4) sub-task.

Conclusion
In this paper, we describe our system to the Personality Prediction and Interpersonal Reactivity Index Prediction sub-tasks.We used the DeBERTa pre-trained language model as the backbone.The prompt is designed for providing the persona information to the pre-trained model.The data augmentation with random punctuation and model ensem-ble is adopted for better results.In the evaluation phase, our methods ranked Top-1 on Track3 and Track4 respectively.In the future, we will focus on more effective prompt designing for performing the personality and interpersonal reactivity prediction.
in a NVIDIA A100 GPU.All experiments select the best parameters in the valid set, and then report the score of the best model (valid set) in the test set.We use the DeBERTa-v3-large 4 (He et al., 2021) as pre-trained model, and we Fine-tune the model.The DeBERTa-v3 model comes with 24 layers and a hidden size of 1036.This model uses a training framework similar to the ELECTRA (Clark et al., 2020) model, and sets up a generator and discriminator.The pre-training task of the discriminator is replaced with token detection (RTD).Finally, the RTD model is selected as the model.Compared with MLM, RTD training can bring more efficient training results.

Table 1 :
Results of the ablation study for Personality Prediction (PER).AVG represents the average of the Pearson correlations over Personality values.The best results are in bold.

Table 2 :
Results of the ablation study for Reactivity Index Prediction (IRI).AVG represents the average of the Pearson correlations over Personality values.The best results are in bold.

Table 3 :
Results of our method on Track3 and Track4 respectively.