From Cognitive to Computational Modeling: Text-based Risky Decision-Making Guided by Fuzzy Trace Theory

Understanding, modelling and predicting human risky decision-making is challenging due to intrinsic individual differences and irrationality. Fuzzy trace theory (FTT) is a powerful paradigm that explains human decision-making by incorporating gists, i.e., fuzzy representations of information which capture only its quintessential meaning. Inspired by Broniatowski and Reyna's FTT cognitive model, we propose a computational framework which combines the effects of the underlying semantics and sentiments on text-based decision-making. In particular, we introduce Category-2-Vector to learn categorical gists and categorical sentiments, and demonstrate how our computational model can be optimised to predict risky decision-making in groups and individuals.


Introduction
Imagine that your town is preparing for a viral outbreak which is projected to kill 600 people. Two alternative programs to combat the virus have been proposed. Assume that the exact scientific estimates of the consequences of the programs are Program A: "200 people will be saved"; and Program B: "1/3 probability that all 600 lives will be saved; 2/3 probability that no lives will be saved". Given these choices, which program would you choose? Alternatively, if choices were presented as follows, which program would you choose? Program C: "400 people will die"; and Program D: "1/3 probability that no one will die and a 2/3 probability that all 600 will die". This problem is a modified version of the Asian disease problem (ADP) (Tversky and Kahneman, 1981), a well-studied risky decision-making problem (RDMP) in psychology where decisions are made under risk or include probabilistic outcomes (Edwards, 1954). In this RDMP, programs A and B form the gain frame where choices are worded in a positive and optimistic manner, whereas programs C and D form the loss frame where choices are written in a negative and pessimistic manner. Studies have validated that in the gain frame, humans overwhelmingly prefer the safe choice A (72%), whereas in the loss frame, they overwhelmingly prefer the risky choice D (78%) even though the choices and outcomes in both frames are equivalent (Tversky and Kahneman, 1981). This phenomenon, known as the Allais paradox, implies that observed human choices are inconsistent with predictions based on expected utility alone, thereby confirming the influence of language, wording of choices, and sentiments on human decision-making.
Being able to understand, model and predict human decision-making leads to many real-world applications, from predicting election results (Hillygus and Shields, 2005), to improving user experience in recommender systems (Chen et al., 2013). However, Allais paradox means that understanding the integral but complex cognitive process of decision-making, particularly in humans, is extremely challenging due to our diverse characteristics, beliefs and experiences. Furthermore, human decision-making is often fraught with irrationality even in the presence of overwhelming evidence against some choice or beliefs (Simon, 1993). This brings into question how human decisionmaking can be modelled with these complexities and nuances involved. This is an especially important task when considering current approaches to decision-making, such as utility theory which typically lacks any behavioural basis and ignore human sentiments during human decision-making (Lerner et al., 2015).
Our goal is to develop a model of automated human decision-making that bridges current decisionmaking techniques with fuzzy trace theory (FTT), an established cognitive theory to predict group and individual decision making outlined in sections 2.2. Originally proposed by Brainerd and Reyna in the 1990s, FTT aims to explain cognitive phenomena in memory and reasoning (Brainerd and Reyna, 1990). In a nutshell, FTT posits that humans form two types of mental representations, known as verbatim which are detailed representations and gist which are fuzzy representations that only capture the most quintessential meanings, and people prefer to make decisions based on gist rather than verbatim representations.
In contrast with alternative cognitive and decision-making theories such as expected utility theory (Friedman and Savage, 1952) and prospect theory (Kahneman and Tversky, 1980), we adopt FTT for two reasons. Firstly, FTT is the most holistic cognitive model which encompasses theories of how information is stored in memory and how memory plays an important role in our decisionmaking rather than treating decision-making as an isolated process. Because of this, FTT provides us with an extensive set of tools to explain and evaluate decision-making. Secondly, is FTTs suitability for computational modelling as conceptual parallels can be drawn between representation learning, particularly in neural-based language modelling, and the process of creating gist representations by distilling the quintessential information. For example, popular embedding methods for words, sentences and documents in NLP aim to create fuzzy semantic representations through dimensionality reduction of language to semantic vectors which can be viewed as gist representations of the original language (Liu et al., 2020). Contributions: We investigate two levels of textbased risky decision prediction tasks, group and individual-level prediction from a computational standpoint and incorporating state-of-the-art methods in NLP, we further investigate: • How do gist representations of choices give rise to decisions? We present a framework of decision-making based on gist representation learning.
• How can we computationally encode gist representations based on the language of choices? We outline how gist representations can be computationally encoded using techniques in NLP and propose Category-to-Vector (Cat2Vec), to learn and predict categorical embeddings of choices.
• How can we extract the underlying sentiments of gist representations? By extending Cat2Vec, we show how sentiments can be learnt at a categorical level; this differs from traditional approaches of sentiment analysis in NLP that examine sentiments at a text level.
• How can individual differences of individuals and groups be modelled, what impact do these differences have on decision-making? We propose that individual differences are mechanisms that can encode errors at various points in the decision-making process and propose an optimisation method to infer these individual differences.
• Finally, we demonstrate in experiments that our proposed model achieves state-of-the-art performance in predicting group and individual-based risky decision-making compared to baselines.

Task Formulation and Related Work
Risky decision-making has been studied in many different contexts. Here we formulate n-choice decision-making problem (nDMP): Taking as input natural language descriptions of n possible choices/outcomes O, choose the most preferred outcome from the set of choices O. We focus on a sub-problem known as a n-choice risky decisionmaking problem (nRDMP) which is an nDMP where there is some risk or probabilistic outcomes associated with choices in O, e.g., programs B and D in the ADP. Specifically, we investigate the gainloss framing problem which is comprised of two nRDMPs, nRDMP gain where choices are written as gain frames which accentuate the positive features of the text, e.g., programs A and B form a 2RDMP gain where 'saving people's lives' is the accentuated feature. Conversely, 2RDMP loss where choices are written as loss frames which accentuate the negative features of the text, e.g., programs C and D form a 2RDMP loss where 'people dying' is the accentuated feature. Additionally, choices have equivalent outcomes across both 2RDMPs.

Classical decision theory
Classical decision theory abstracts the outcomes using utilities, which are numerical values that reflect desirability. For example, expected utility theory (EUT) identifies the choice that maximises the expected utility assuming the axioms of rationality (Von Neumann and Morgenstern, 2007). However, in human decision-making these axioms are often violated, giving rise to, e.g., Allais (Allais, 1953) and Ellsberg (Segal, 1987) paradoxes. Generalised EUT such as uncertain utility theory (Gul et al., 2008), cumulative prospect theory (CPT) (Tversky and Kahneman, 1992), and multiple-criteria decision-making (MCDM) (Zeleny, 2012) were proposed to resolve these discrepancies. However, these classical approaches not only fail to take into account semantic information given by the working of choices which is important contextually for decision making, but they also ignore cognitive processes such as sentiments of decision-makers. Recent breakthroughs in NLP have led to a revolution in the breadth and robustness of problems that can be solved involving natural language by successfully capturing the underlying semantics and relationships of language. For example, neural language models have found resounding success in representation learning (Mikolov et al., 2013;Devlin et al., 2018), the task of uncovering feature representations of language which are useful for downstream NLP tasks. One such downstream NLP task, sentiment analysis, has benefited largely from the application of language models such as XLNet (Yang et al., 2019) and ULMFiT (Howard and Ruder, 2018). Rapid advancements thus give hope for the development of sophisticated computational decision-making models.

Group/Individual-level Tasks
In this paper, we consider two specific 2RDMP, group level risky decision making (GL-RDM) which the majority of psychological studies focus on and the novel task of individual level risky decision making (IL-RDM), defined as follows. GL-RDM: Given a set of observed outcomes from human RDM experiments, each of which is described by a 5-tuple (2RDMP gain , 2RDMP loss , P gain , P loss , category), where 2RDMP gain is the gain frame of a 2RDMP, P gain is the proportion of individuals in the gain frame who chose the risky choice, and category is a grouping of similar experiments based on design and participants described in Section 6.1. 2RDMP loss and P loss can be defined similarly by replacing gain with loss. GL-RDM's objective is to predict the distribution of choice between P gain and P loss and for unseen human experiments within the same category. IL-RDM: Given a set of nRDMPs, RDP = {rdp 1 , rdp 2 , . . . , rdp n } where gain/loss frames of the same problem can appear as separate RDPs rdp i , a set of individuals Ivd = ivd 1 , ivd 2 , . . . , ivd m and a function which maps individuals and RDPs to their preferred choice P C(id i , rdp j ) = pc i,j where pc i,j is individual id i 's preferred choice for rdp j . The objective for IL-RDM is to learn a model/mapping function for each individual which can predict an individual's preferred choice for unseen nRDMPs.

FTT-guided Risky Decision-making
The BR model. Broniatowski and Reyna laid out four main FTT principles in developing a cognitive model, i.e., the BR model, for the GL-RDM task (Broniatowski andReyna, 2018, 2014). These principles are: (C1) Decision choices are encoded in different levels of gist representations, e.g., categoricaland interval-levels based on the psychological notion of levels of measurement (Stevens et al., 1946). (C2) Categorical gist representations of choices are distinguished based on binary (positive/negative) sentiments and decisionmakers will prefer options with positive associations. In the BR model, sentiments of categories are drawn upon social and moral principles which are stored in long-term memory, e.g., saving lives is fundamentally good. (C3) When comparisons of categorical gist representations do not arrive at a conclusive result, the decision-maker will revert to more precise gist representations. In the BR model, gist representations compete and combine such that the simplest gist representation is chosen. (C4) Categorical gist is encoded based on the decision-maker's prior experiences and individual differences, i.e., need for cognition (NFC), numeracy (NUM), and risk sensitivity (RS).
Human experiments have provided evidence that the BR model is capable of explaining GL-RDM. However, being a box-arrow model, the BR model is comprised of hypothesized concepts or processes that lack precise definitions. Hence applying the model requires human interpretations and judgements on, e.g., notions such as gist lattices of each RDMP, the acquisition of sentiments, and individual differences. This informal nature, along with the inflexibility of the model being unable to be easily adapted to IL-RDM prevents the model from being used as an automated predictive tool.
Our model. Towards a fully automated tool for the RDMP tasks, we propose a computational model of risky decision-making that takes the input text descriptions of an RDMP and solves the GL-RDM and IL-RDM tasks automatically. The model is depicted in Figure 1. The key features of our model include: (1) All model components are automated, i.e., gist representations are extracted through NLP as categorical embeddings. (2) Categorical and interval representations are formally defined as hierarchical with the interval level encapsulating the properties and information of the categorical-level (Stevens et al., 1946). (3) Individualistic differences, NFC, NUM and RS (see below) directly affect decision-making at a representational level and errors in judgement can propagate through the model adding more expressive.

Computing Gist Representations
The first challenge we tackle is the computational encoding of gist representations and how individual differences encode error, a mechanism to model variations in human decision-making at a representation level to perform GL-RDM and IL-RDM.

Categorical Representations
Categorisation is the act of grouping documents into categories based on semantic or sentiment similarity. For example, the entertainment category in the news dataset (see Section 6.1) comprises of various articles spanning multiple topics such as games, movie reviews, and celebrity gossip. The ability to categorise and recall the underlying sentiments of categories is an important prerequisite for FTT decision making asserted by principle C2 where humans prefer choices associated with categories with positive connotations over negative connotations. For example, the sentiments of the travel category in the news dataset has a strong negative sentiment due to the news articles being collected during the outbreak of COVID-19. Given this negative sentiment, people would be dissuaded from travelling. Current sentiment analysis methods focus on granular extraction of sentiments from text rather than categories where words that are highly indicative of a category do not necessarily reveal any insights into their sentiment.
Vector representations of words (Mikolov et al., 2013), sentences (Devlin et al., 2018), and documents (Le and Mikolov, 2014) capture the semantic relationships between entities. At a higher level, a categorical embedding should capture semantic relations between categories of documents. To our knowledge, no such representation has been proposed. To fill this gap, we propose Category-2-Vector (Cat2Vec) and a sentiment based extension, sentiment-Cat2Vec. Cat2Vec aims to find modelagnostic categorical representations that facilitates the prediction of categories from text and sentiments from categories. More formally, given a set of M categories C = {k 1 , k 2 , . . . , k M }, a training set contains a number N of document-category pairs {(d 1 , c 1 ), . . . , (d N , c N )}, where each d i is a document and c i ∈ C is the (ground truth) category of d i . The objective is to maximise the average log probability 1 is the probability that document d i belongs to category c i . In sentiment-cat2vec, P (c i |d i ) is replaced by P (s i |c i ), the probability that category c i belongs to a certain sentiment class. Here, we consider only binary positive and negative sentiments.
Cat2Vec extends a contrastive learning via negative sampling approach by simultaneously maximising the similarity between document encodings, v d i , with true category embeddings, v c i , by minimising the similarity between v d i and K negative category embeddings defined by the objective: where represents element-wise multiplication, P noise (C) is a noise distribution that dictates how categories are sampled, we select a uniform distribution, σ(x) = 1/(1 + exp(−x)) and v i i is a category importance vector for category i which is learned simultaneously with the category embedding which provides an attention-like effect over category by accentuating or diminishing certain features in the category embedding when multiplied together. Furthermore, v d i = Enc(d i ) where Enc is a document vector encoding function. In this paper we adopt a bi-directional LSTM with self- where − → α , ← − α are self-attention weights of the forwards and backwards LSTMs, resp., and − → are the hidden state output vectors for document d i of the forward and backwards LSTMs, respectively. However, the encoder is interchangeable in Cat2Vec e.g. pretrained transformers like BERT (Devlin et al., 2018) can be used.
The novelty of our model lies in two main aspects, the introduction of a category importance vector to improve the ability of the model to learn relations between categories and the ability of our model to estimate P (s i |c i ) given labelled text doc-  Figure 1: The main architecture of our computational model for decision making based on FTT uments. To estimate P (s i |c i ) we introduce an extra dense output layer (D) in figure 2 which predicts P (s|d i ), the probability that document i belongs to a certain sentiment class. In the case of binary sentiments, the joint loss becomes the binary crossentropy loss of predicting the correct sentiment of a document plus the negative sampling loss in equation 1. After training the model we can estimate P (s i |c i ) by feeding the learned category embeddings, (v c i v i i ) into the output layers (D). Although these output layers are trained to learn P (s|d i ), since the learned category embeddings are based on the document embeddings and are learned in the same semantic space, this approach gives us good estimates of P (s i |c i ). shows how error encoded categorical utility (CU) is calculated from categorical representations where Category(choice) is a function that takes RDM choices as inputs and outputs the underlying category related to the choice, Sentiment(category) is a function which takes a category as input and outputs the underlying sentiments related to that category as pos category − neg category . Categorical error is encoded based on NFC, an individual's tendency to engage in and enjoy cognitive activities (Cacioppo et al., 1996), can introduce error at a categorical level to account for individualism. To calculate an error encoded CU, we sample from a logistic distribution which is consistent with existing literature in qualitative discrete choice models (McFadden, 2001). Formally, NFC ∈ (0, 1) and CU ∼ Logistic(µ, s) where µ = E[X] is the expected or true utility value and

Interval Representations
Interval representations are a more precise representation than categorical representation. It encodes the calculation of the expected value (EV) and utility of choices. Numerical information from text can be extracted using simple text extraction or named entity recognition (NER) (Nadeau and Sekine, 2007) where probabilities and their associated quantities can be extracted as arrays, e.g., in program B of the RDMP in the introduction the probabilities would be [1/3, 2/3] and their corresponding quantities would be [600, 0]. Equation 3 outlines the process to generate errorencoded interval utilities where CU is the categorical utility defined in equation 2 and EV is an expected value function which takes an input RDM choices and outputs the corresponding expected value associated with probabilities and quantities in choices which can extracted using standard text identification techniques such named entity recognition. Error is encoded based on NUM (Kahneman, 2003), which measures a person's ability to interpret and work with numbers to account for individualism is calculated as where Q is number of quantities in the choice to account for error involving multiple calculations.

Representations for Decision Making
Finally, combining these representations allows us to derive the most beneficial choice in an nRDMP. The preferred categorical, Pref Cat and preferred interval, Pref Int choices are calculated based on which choice maximises categorical and interval utilities, respectively. If Pref Cat = Pref Int , there is a consensus on the best choice. If Pref Cat = Pref Int , there is no clear best choice. In this case, RS, a person's preference towards pursuing riskier but more rewarding decisions (Kacelnik and Bateson, 1997), is adopted as in the BR model. Risk sensitivity influences the probability of choosing the safest or riskiest choice in an nRDMP as P (risky) = 1/(1 + e −RS ) where RS ∈ (−3, 3). The safest choice is one that involves the least probabilistic outcomes, whereas conversely, the riskiest choice involves the most probabilistic outcomes e.g., in the ADP in the introduction, program A is the safest as it involves one certain outcome while program B is the riskiest with two probabilistic outcomes.

Decision Making: A Worked Example
To demonstrate the fluidity of our model we apply our model to the ADP from the introduction. In the gain frame, the predicted category of programs A and B using the pretrained Cat2Vec from the experiments predicts the life category for both programs. The sentiments of the life category predicted by Sentiment-Cat2Vec is 0.9999 positive and 0.0001 negative giving categorical utility defined as pos category − neg category for both programs equal to 0.9998. Taking into account numerical information, the expected value of programs A and B is 200 people being saved, the interval utility is thus the expected value times the categorical utility which is 199.6 for both programs. No consensus between categorical or interval choices can be made due to unclear preferred choices for both. Thus, the final choice is decided by risk sensitivity.
Individual differences encode error and preferences into choices allowing for consensus to arise, e.g., a person with low numeracy will sample interval utilities from a logistic distribution with a larger spread than someone with high numeracy who samples utility close to the true utility. Because the error encoded utilities are sampled, the preferred choice can change on different runs of the problem; however, individual differences influence the average choice. Figure 3 shows a snapshot when one parameter was altered while the others were fixed and how these parameters can alter utilities to prefer certain choices across frames in the ADP.

Learning Individual Differences
The last challenge we explore is how optimal individual-level parameters, NFC, NUM and RS can be inferred in GL-RDM and IL-RDM by optimising the following objective functions. is the parameters which characterises individual i, RDP is a set of nRDMPs and P C is the mapping function of individuals to their preferred choices defined in the task formulation in section 2.2. Thus, the goal is to learn optimal individual parameters for each individual which maximises the expectation that CDM chooses their true preferred choice over all RDPs.
6 Experiments 6.1 Datasets Categorical News. A dataset used for training/fine-tuning Cat2Vec and benchmark algorithms. The dataset contains 22601 news articles with binary sentiments labelled from various news outlets dating from February to April 2020 using the Google News API spanning 46 news categories, e.g. travel, entertainment and death. Group Risky Decision Making. A dataset of 88 psychological human experiments results grouped into categories used in the evaluation of the BR model. The categories represent differences in experiment controls and participants that undertook each experiment, e.g., 'ADP; within-subjects, low PISA'. The category outlines the risky decisionmaking problem; experimental design which can be grouped into within, where each participant is given both frames of a decision or between subject designs, where two independent groups answer each frame; and numeracy of participants, based on the performance of the country in which the experiment took place in the Program for International Student Assessment (PISA) (Stacey, 2015). Individual 2-RDMP Prediction. A curated dataset of 38 unique 2-RDMPs selected from various psychological experiments regarding risky decision-making answered by 121 university students using a within-subject experimental design. Of the 38 2-RDMPs, most problems contain a corresponding gain and loss frame, e.g., the ADP in the introduction, each frame is considered a separate problem. Participants selected their preferred choice from the same pre-shuffled RDMP set 1 and no pre/post-processing of data was performed.

Evaluation Metrics
We apply different evaluation metrics suitable for each RDM task. For GL-RDM, we compare the true log-odds ratio (LOR) given by equation 7, between experimental results predicted by our model and the BR baseline model. Intuitively, the LOR measures the consistency of choices across the gain and loss frames.
1 See appendix, section A.3 for questionnaire To determine the goodness-of-fit between the predicted LOR, we apply a hypothesis test, the Wald statistic (χ 2 ) given by equation 9. The standard error (SE) is given by equation (8) where n safe,gain represents the number of individuals choosing the safe choice in the gain frame, n safe,loss , n risky,gain , n risky,loss can be derived similarly. The standard error asymptotically approaches a normal distribution when n is sufficiently large; thus, the associated Wald statistic, equation (9), follows a chi-square distribution with one degree of freedom. SE = 1 n safe,gain + 1 n safe,loss + 1 n risky,gain + 1 n risky,loss (8) To compare the parsimony and implicitly the error between our the BR and null (Busemeyer et al., 2015) models, we use the Akaike information criterion (AIC) and Bayesian information criterion (BIC). For IL-RDM, we evaluate the accuracy of each model correctly predicting the true choices for each individual.

Benchmark Algorithms
In the paper, we use two different sets of baselines. For GL-RDM, we directly compare our model against the BR model. Due to the small number of experiments per grouping in the GL-RDM dataset, to maintain parity with the BR baseline, we apply the same jackknife-leave-one-out (JLOO) method used for parameter estimation in the BR baseline model to avoid post-hoc parameter estimation (Busemeyer and Wang, 2000). Formally, given m observed human risky decision making experiment results within a category of comparable RDPs, Exp = {e 1 , e 2 , . . . , e m } as described in the task formulation in section 2.2. We wish to estimate m values of G i = (NUM i , NFC i , RS i ) where G i is group-level differences relating to observed experiments i where i = 1 . . . m. To achieve this, we apply the G i can be estimated by equation (10) where ER −i is the set of experimental results ex-cluding er i as not to use the result in the estimation.

arg min
For IL-RDM, due to a lack of existing benchmark algorithms we compare our model against two baselines (1) Naive binary model using pretrained transformer language models where all RDM-choices are combined as a single input and outputs 0 or 1 corresponding to the safe or risky choice.
(2) Sentiment analysis models as claim C2 asserts sentiments are highly influential in decisionmaking where decisions are based on choices with the highest positive sentiment. Random: Uniformly samples one of the available choices. Vader: A rule-based sentiment analysis for social media(Gilbert and Hutto, 2014). XLNet: SOTA pretrained autoregressive language model fine-tuned on the news dataset sentiments (Yang et al., 2019).
ULMFiT: Pretrained language model fine-tuned on the news dataset sentiments using inductive transfer learning (Howard and Ruder, 2018).

Experiment Results
GL-RDM results. Table 1 (full table A .4 in Appendix) shows key discrepancies between our computational model, CDM, compared to both the actual LOR based on all 88 human experiments and those predicted by the BR model. Within each category, we find optimal group-level parameters which minimise (4) to calculate the predicted LOR of our model for each experiment using the jackknifeleave-one-out (JLOO) method to maintain comparability between the BR model.
Critically, our results show that our computational model is capable of automating the prediction of human risky decision making on a wide range of RDMPs by predicting 82 of 88 (93.2%) experiments based on the Wald statistic. These results hold even when RDMPs were manipulated to capture a wider gamut of decision making through variations on framing and truncation of choices where options were removed (Reyna et al., 2014). These results are comparable to carefully crafted human conducted analysis using the BR model which also predicted 82 of 88 experiments.
To further demonstrate the parsimony of our model compared to the BR and null models by applying the AIC and BIC metrics under a null  CDM model where parameters are set to 0, we get AIC=14941 and BIC=14950. Whereas under the null model of the BR model, AIC=14981 and BIC=14986. In the best cases, our model outperforms all variations of the BR baseline model, with our model attaining AIC=13374 and BIC=13383 compared to AIC=13409 and BIC=13510. Furthermore, taking into consideration the relative likelihood ratio (RLR) to compare models using the AIC scores, exp((13374 − 13409)/2) = 2.5 × 10 −8 , yields a significant result where the BR baseline model is only 2.5 × 10 −8 times as probable as our model to minimize the information loss. Thus, our model attains better goodness-of-fit compared to the BR model while using significantly fewer parameters, 3 compared to up to 172 in the BR model. IL-RDM results. Table 2 displays the average 5fold cross-validation result predicting all 121 individuals' decisions for all 38 questions. Our model with a modest 63.19% accuracy outperforms all benchmark algorithms which hover around 50% for sentiment and 60% for pretrained language model baselines. This reinforces that IL-RDM is a more challenging problem and although sentiment analysis is important for decision-making, current SOTA sentiment analysis is not suitable for IL-RDM and only performs comparably to random choice. It is worth noting that while pretrained language models can be naively applied to IL-RDM with competitive results, they can not be naively applied to GL-RDM which requires the simultaneous predictions of two distributions of choices across frames where often the same RDM and choices is used across all experiments within a category.
Also displayed in the table are results when using transformers as encoders within Cat2Vec and results from a minor ablation study. For IL-RDM, transformers do not significantly improve accuracy as the resulting predicted categories and sentiments of categories from RDM-choices are highly similar between encoders. In the ablation study where choices are derived based on preferred choices at different levels of representation, i.e., CDM Categorical and CDM Interval , reinforces the full expressiveness of our model comes from the consensus between levels of representation and influence of individual differences.

Error Analysis and Discussion
To understand the shortfall of our model for both GL-RDM and IL-RDM, we analyse cases in which our model fails to predict human decision-making.
In GL-RDM, of the 6 experiments that our model did not successfully predict, 3 of these ((4),(5) and (7) in table 1) were not predicted by the BR baseline model indicating problems with parameter estimation using JLOO as these experiments are outliers with relatively significant differences in LORs within their respective categories.
In IL-RDM, inconsistencies exist across 2-RDMPs due in part to the within-subject design as participants may notice the underlying problem causing them to compare between problems rather than independently (Kahneman and Frederick, 2002). For example, figure 4 shows loss frames where individuals overwhelming preferred the safe choice, e.g., Q2, Q4 and most gain-loss pairs do not show a clear distinction between safe and risky choices in opposing frames, e.g. (Q26, Q2). Both cases are inconsistent with psychological studies.
Quantity of data is also an issue. While the num-  (1). Also, since the size of each fold is relatively small, containing 7-8 test RDPs, any RDP not predicted correctly will cause a large decrease in accuracy. Figure 5(B) shows the percentage of correct choices our model predicts for all individuals from the combined 5-fold test questions. On average, our model predicts gain and loss RDMPs relatively equally with accuracy of 65.73% and 62.84% respectively. However, RDMPs with "both frame", contains choices with combined gain and loss wording, cannot be predicted by our model due to this duality with an average of 42.73%. This paper provides the first steps into a fully computational framework of risky decision-making, which adopts the cognitive and psychological basis of FTT with our model outperforming baselines in individual and group RDP prediction. Potential applications of our model are wide-ranging for scenarios in which predicting and understanding the characteristics of human risky decision-making is pivotal, e.g., the design of safety mechanisms based on how people make decisions in risky scenarios or in improving personalised recommendation systems based on understanding the users' personal traits and how they make decisions. Future work, therefore, involves adapting our model towards real-world applications, exploration of generalised decision-making and the design and evaluation of sophisticated end-to-end machine learning models for text-based decision-making.

A.1 Final Decision Algorithm
Algorithm 1 corresponds to the algorithm mentioned in section 6.1 of the main paper. return PrefCat 8: else if Uniform(0, 1) ≤ RiskSensitivity(RS) then 9:

Algorithm 1 Computational Decision Making
return Riskiest Choice 10: else 11: return Safest Choice

A.2 Evaluation Metric Calculations
The calculations for the second type of evaluation metric we use to compare the parsimony of our model against baseline algorithms are the Akaike information criterion (AIC) and Bayesian information criterion (BIC) are given by equations (13) and (14) using the log-likehood calculated by equations (11) and (12). In these equations n 1,1 is the number of people who chose the first choice (safe choice) in the first problem (gain frame), p 1,1 is the predicted proportion of subjects who chose the first choice (safe choice) in the first problem (gain frame), etc. For the AIC, k is the total number of parameters of our model, 3 which correspond to each individual difference and in BIC, n is the total number of data points, 176 to represent the gain and loss frames in the 88 human experiments.
ln[L(y i )] = n 1,1 ln p 1,1 + n 1,2 ln p 1,12 + n 2,1 ln p 2,1 + n 2,2 ln p 2,2 (11) To compare models using AIC, the relative likelihood ratio (RLR) given in equation 15 can be applied which compares the probability that the BR baseline model minimises the estimated information loss compared to our CDM model given that AIC CDM ≤ AIC BR where AIC BR and AIC CDM are corresponding AIC scores of each model.

A.3 Individual Level Questionnaire
Full inventory of all 36 questions used in the Individual 2-RDMP Prediction dataset: Q1: Which of the following options do you prefer?
(a) A sure win of $30 (b) 80% chance to win $45 Q2: Imagine that 6000 pieces of precious paintings in a world-famous museum are accidentally exposed to a disastrous chemical pollution. Two alternative plans to rescue these art treasures have been proposed. Assume that the exact estimates of the consequences of the plans made by scientists are as follows: (a) If plan A is adopted, 4000 pieces will be destroyed by the chemical pollution.
(b) If plan B is adopted, there is a one-third probability that none of these paintings will be destroyed, and two-thirds probability that all 6000 of these paintings will be destroyed.
Q3: A large car manufacturer has recently been hit with a number of economic difficulties and it appears as if three plants need to be closed and 6000 employees laid off. The vice-president of production has been exploring alternative ways to avoid this crisis and has developed two plans: (a) Plan C: This plan will result in the loss of 2 plants and 4000 jobs.
(b) Plan D: This plan has a 2/3 probability of resulting in the loss of 3 plants and all 6000 jobs, but has a 1/3 probability of losing no plants and no jobs Q4: Imagine you recieve a letter from the president of a subsidiary describing a dilemma concerning whether to fight an impending patent violation suit or settle out of court that reads: If we do not agree to this proposal, PMG will file their suit. Going to court would involve the possibility of losing $1,100,000 in damages and losing the Duraplast line. If we win in court, we will incur a small sum for legal expenses. Our corporate lawyer, Mr. Bell, and our outside law firm estimate that we have a 2 in 3 chance of losing the case.
(a) Agree to the proposal (no lawsuit) (b) Disagree to the proposal: 2/3 chance of losing the lawsuit and incurring costs of $1100000 Q5: Imagine that you have lung cancer and you must choose between two therapies: surgery and radiation. Surgery for lung cancer involves an operation on the lungs. Most patients are in the hospital for two or three weeks and have some pain around their incisions; they spend a month or so recuperating at home. After that, they generally feel fine. Radiation therapy for lung cancer involves the use of radiation to kill the tumor and requires coming to the hospital about four times a week for six weeks. Each treatment takes a few minutes and during the treatment, patients lie on a table as if they were having an x-ray. During the course of the treatment, some patients develop nausea and vomiting, but by the end of the six weeks they also generally feel fine. Thus, after the initial six or so weeks, patients treated with either surgery or radiation therapy feel about the same. Q6: Imagine that you brought $6000 worth of stock from a company that has just filed a claim for bankruptcy recently. The company now provides you with two alternatives to recover some of your money.
(a) You will save $2000 of your money (b) You will take part in a random drawing procedure with exactly a one-third probability of saving all $6000 of your money, and twothirds probability of saving none of your money.
Q7: Imagine that in one particular state it is projected that 1000 students will dropout of school during the year, two programs have been prosed to address this problem, but only one can be im-plemented. Based on other states experiences with programs, estimates of the outcomes that can be expected for each program can be made.
(a) Program 1: 600 of the 1000 students will drop out of school (b) Program 2: 2/5 chance that none of the 1000 students will drop out of school and 3/5 chance that all 1000 students will drop out of school Q8: Assume that you have just been given a gift of $1000.
(a) Taking an additional $500 for sure. (a) 25% chance to win $240, and 75% chance to lose $760 (b) 25% chance to win $250, and 75% chance to lose $750 Q10: You are staying in a hotel room on vacation. You paid $6.95 to see a movie on pay TV. After 5 minutes you are bored and the movie seems pretty bad. Would you continue to watch the movie or not?
(a) Continue to watch (b) Turn it off and lose $6.95 Q11: Imagine that your country is preparing for the outbreak of an unusual disease, which is expected to kill 600 people. Two alternative programs to combat the disease have been proposed.
Assume that the exact scientific estimate of the consequences of the programs are as follows: (a) If Program A is adopted, 200 people will be saved (b) If Program B is adopted, there is 1/3 probability that 600 people will be saved, and 2/3 probability that no people will be saved Q12: Imagine that you have decided to see a play where admission is $10 per ticket. As you enter the theatre you discover that you have lost a $10 bill.
(a) Still pay $10 for a ticket for the play (b) Don't pay $10 for a ticket for the play Q13: Consider the following two stage game. In the first stage, there is a 75% chance to end the game without winning anything, and a 25% chance to move into the second stage. If you reach the second stage, you have a choice between: A sure win of $30 and 80% chance to win $45 (a) A sure win of $30 (b) 80% chance to win $45 Q14: Imagine that six people in your family, including both of your parents, your brothers and your sisters, are infected by a fatal disease. Two alternative medical plans to treat the disease have been proposed. Assume that the exact scientific estimates of the consequences of the plans are as follows: (a) If plan A is adopted, two of them will be saved.
(b) If plan B is adopted, there is a one-third probability that all six of them will be saved, and two-thirds probability that none of them will be saved.
Q15: Your are presented with the following report from the head of a special team assigned to investigate the prospects of a project in Arizona: Our new analysis indicates that, if we choose to compete with ATC, we would face the possibility of capturing only a small market share. This would give us an after-tax return on investment of as little as 10%, while capturing a large market share would give us a return of 22%. We estimate our chance of getting a small market share to be 2 in 3. If we were to team up with ATC on the terms proposed, our return would be 14% after tax, with the same total investment.
(a) Compete with ATC: 1/3 chance of gaining a large market share of 22% and 2/3 chance of gaining a small market share of 10% (b) Don't compete with ATC: 100% chance of capturing 14% market share Q16: A committee found a fish disease in a nearby lake. About 12 fish species (among them the most popular dining fish) have the Proliferative Kidney Disease (PKD). This is a chronically developing infectious disease which can have deadly consequences for the fish. Young fish are especially susceptible, while others seem to be immune against an infection. Experts suggest that PKD is one cause of declining fish catches. The researchers assume human activities and water pollution foster the spread of the disease. They are considering releasing more fish into the lake to control the epidemic. Imagine that you are a government official of the adjacent village.
(a) Option A: If the release of fish is implemented, 4 fish species will survive.
(b) Option B: If the release of fish is implemented, there is 1/3 probability that all of the 12 fish species will survive, and 2/3 probability that none of them will survive.
Q17: Imagine a refinery that processes petroleum products. An investigation found that due to tank leaks, both soil and drinking water became contaminated. Due to this contamination 720 children from the adjacent village have a fatal disease. There is agreement among experts that children will not suffer health problems, provided they have a strong immune system. Otherwise, it is likely that children will have serious health problems. A vaccine against this disease has been developed and tested. However, the vaccine sometimes can cause side effects that can be fatal too. You are an environmental activist with much influence on the local hospital and you have to decide if you want to lobby for the vaccination or not.
(a) Option C: If the vaccination is adopted, the health of 480 children will be damaged for sure.
(b) Option D: If the vaccination is adopted, there is a one-third probability that the health of none of the 720 children will be damaged, and a two-thirds probability that the health of all 720 of them will be damaged. (a) Make the trip to the other store and save 5 dollars but lose 20 minutes (b) Don't make the trip to the other store and save 20 minutes but lose 5 dollars Q21: Imagine that six people in your family, including both of your parents, your brothers and your sisters, are infected by a fatal disease. Two alternative medical plans to treat the disease have been proposed. Assume that the exact scientific estimates of the consequences of the plans are as follows: (a) If plan A is adopted, four of them will die.
(b) If plan B is adopted, there is a one-third probability that none of them will die, and two-thirds probability that all six of them will die.
Q22: Imagine you recieve a letter from the president of a subsidiary describing a dilemma concerning whether to fight an impending patent violation suit or settle out of court that reads: If we do not agree to this proposal, PMG will file their suit. Going to court would involve the possibility of keeping the Duraplast line and incurring only a small sum for legal expenses. If we lose in court, we will incur $1,100,000 in damages. Our corporate lawyer, Mr. Bell, and our outside law firm agree that we have a 1 in 3 chance of winning the case.
(a) Agree to the proposal (no lawsuit) (b) Disagree to the proposal: 1/3 chance of winning the case Q23: Imagine that your country is preparing for the outbreak of an unusual disease, which is expected to kill 600 people. Two alternative programs to combat the disease have been proposed. Assume that the exact scientific estimate of the consequences of the programs are as follows: (a) If Program C is adopted 400 people will die.
(b) If Program D is adopted there is 1/3 probability that no one will die, and 2/3 probability that 600 people will die.
Q24: You are staying in a hotel room on vacation. You turn on the TV and there is a movie on. After 5 minutes you are bored and the movie seems pretty bad. Would you continue to watch the movie or not?
(a) Continue to watch (b) Turn it off Q25: Imagine that six people are infected by a fatal disease. Two alternative medical plans to treat the disease have been proposed. Assume that the exact scientific estimates of the consequences of the plans are as follows: (a) If plan A is adopted, four people will die.
(b) If plan B is adopted, there is a one-third probability that none of them will die, and two-thirds probability that all six people will die.
Q26: Imagine that 6000 pieces of precious paintings in a world-famous museum are accidentally exposed to a disastrous chemical pollution. Two alternative plans to rescue these art treasures have been proposed. Assume that the exact estimates of the consequences of the plans made by scientists are as follows: (a) If plan A is adopted, 2000 pieces will be saved from the chemical pollution.
(b) If plan B is adopted, there is a one-third probability that all the 6000 paintings will be saved, and two-thirds probability that none of these paintings will be saved.
Q27: Imagine that in one particular state it is projected that 1000 students will dropout of school during the year, two programs have been prosed to address this problem, but only one can be implemented. Based on other states experiences with programs, estimates of the outcomes that can be expected for each program can be made.
(a) Program 1: 400 of the 1000 students will stay in school (b) Program 2: 2/5 chance that all 1000 students will stay in school and 3/5 chance that none of the 1000 will stay in school Q28: Imagine that you have lung cancer and you must choose between two therapies: surgery and radiation. Surgery for lung cancer involves an operation on the lungs. Most patients are in the hospital for two or three weeks and have some pain around their incisions; they spend a month or so recuperating at home. After that, they generally feel fine. Radiation therapy for lung cancer involves the use of radiation to kill the tumor and requires coming to the hospital about four times a week for six weeks. Each treatment takes a few minutes and during the treatment, patients lie on a table as if they were having an x-ray. During the course of the treatment, some patients develop nausea and vomiting, but by the end of the six weeks they also generally feel fine. Thus, after the initial six or so weeks, patients treated with either surgery or radiation therapy feel about the same.
(a) Surgery: Of 100 people having surgery, 10 die during surgery or the postoperative period, 32 die by the end of one year and 66 die by the end of five years.
(b) Radiation Therapy: Of 100 people having radiation therapy, none die during treatment, 23 die by the end of one year and 78 die by the end of five years.
Q29: Imagine that you have decided to see a play and paid the admission price of $10 per ticket. As you enter the theatre you discover that you have lost the ticket. The seat was not marked and the ticket cannot be recovered.
(a) Pay $10 for another ticket (b) Don't pay $10 for another ticket Q30: Your are presented with the following report from the head of a special team assigned to investigate the prospects of a project in Arizona: Our new analysis indicates that, if we choose to compete with ATC, we would have the possibility of capturing a large market share. This would give us an after-tax return on investment of as much as 22%, while capturing a small market share would give us a return of only 10%. We estimate a 1 in 3 chance of getting a large market share. If we were to team up with ATC on the terms proposed, our return would be 14% after tax, with the same total investment.
(a) Compete with ATC (b) Don't compete with ATC Q31: A large car manufacturer has recently been hit with a number of economic difficulties and it appears as if three plants need to be closed and 6000 employees laid off. The vice-president of production has been exploring alternative ways to avoid this crisis and has developed two plans: (a) Plan A: This plan will save 1 plant and 2000 jobs (b) Plan B: : This plan has a 1/3 probability of saving all 3 plants and all 6000 jobs, but has a 2/3 probability of saving no plants and no jobs Q32: Imagine that you are about to purchase a jacket for $15, and a calculator for $125. The calculator salesman informs you that the calculator you wish to buy is on sale for $120 at the other branch of the store, located 20 minutes drive away.
(a) Make the trip to the other store and save 5 dollars but lose 20 minutes (b) Don't make the trip to the other store and save 20 minutes but lose 5 dollars Q33: A committee found a fish disease in a nearby lake. About 12 fish species (among them the most popular dining fish) have the Proliferative Kidney Disease (PKD). This is a chronically developing infectious disease which can have deadly consequences for the fish. Young fish are especially susceptible, while others seem to be immune against an infection. Experts suggest that PKD is one cause of declining fish catches. The researchers assume human activities and water pollution foster the spread of the disease. They are considering releasing more fish into the lake to control the epidemic. Imagine that you are a government official of the adjacent village.
(a) Option C: If the release of fish is implemented, 8 fish species will die.
(b) Option D: If the release of fish is implemented, there is 2/3 probability that none of the 12 fish species will die, and 1/3 probability that all of the 12 fish species will die. Q34: Imagine that you brought $6000 worth of stock from a company that has just filed a claim for bankruptcy recently. The company now provides you with two alternatives to recover some of your money.
(a) You will lose $4000 of your money (b) You will take part in a random drawing procedure with exactly a two-thirds probability of losing $6000 all of your money, and onethird probability of not losing any of your money Q35: Imagine a refinery that processes petroleum products. An investigation found that due to tank leaks, both soil and drinking water became contaminated. Due to this contamination 720 children from the adjacent village have a fatal disease. There is agreement among experts that children will not suffer health problems, provided they have a strong immune system. Otherwise, it is likely that children will have serious health problems. A vaccine against this disease has been developed and tested. However, the vaccine sometimes can cause side effects that can be fatal too. You are an environmental activist with much influence on the local hospital and you have to decide if you want to lobby for the vaccination or not.
(a) Option A: If the vaccination is adopted, the health of 240 children will be saved for sure.
(b) Option B: If the vaccination is adopted, there is a one-third probability that the health of all of the 720 children will be saved, and a two-thirds probability that the health of none of them will be saved.
Q36: Imagine that six people are infected by a fatal disease. Two alternative medical plans to treat the disease have been proposed. Assume that the exact scientific estimates of the consequences of the plans are as follows: (a) If plan A is adopted, two people will be saved.
(b) If plan B is adopted, there is a one-third probability that all six people will be saved, and two-thirds probability that none of them will be saved.