Language Models with Rationality

While large language models (LLMs) are proficient at question-answering (QA), it is not always clear how (or even if) an answer follows from their latent “beliefs”. This lack of interpretability is a growing impediment to widespread use of LLMs. To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that answers are supported by interpretable chains of reasoning drawn from a consistent network of beliefs. Our approach, which we call R EFLEX , is to add a rational, self-reflecting layer on top of the LLM. First, given a question, we construct a belief graph using a backward-chaining process to materialize relevant model beliefs (including beliefs about answer candidates) and their inferential relationships. Second, we identify and minimize contradictions in that graph us-ing a formal constraint reasoner. We find that R EFLEX significantly improves consistency (by 8%-11% absolute) without harming overall answer accuracy, resulting in answers supported by faithful chains of reasoning drawn from a more consistent belief system. This suggests a new style of system architecture in which an LLM extended with a rational layer can provide an interpretable window into system beliefs, add a systematic reasoning capability, and repair latent inconsistencies present in the LLM.


Introduction
While large language models (LLMs) are impressive at question-answering (QA), it is not always clear how (or even if) an answer follows from their latent "beliefs"1 about the world, or whether the LLM even has a coherent internal belief system.This general opacity is a growing impediment to widespread use of LLMs, e.g., in critical applications such as medicine, law, and hiring decisions, (Bottom) REFLEX adds a "rational" layer above the LLM layer, in which a belief graph is constructed (by iteratively querying the LLM, up/down arrows), containing relevant model-believed facts (white/grey = believed T/F) and their inferential relationships.Inconsistencies are then identified (red) and minimized by a constraint reasoner that flips T/F labels on beliefs (green ✓/X), here resulting in the correct answer (S 1 , green box) + explanation (graph) by the overall system (blue).
where properties of explainability, interpretability, and trust are paramount.Our goal is to help alleviate such opacity by constructing an explicit representation of system beliefs and their inferential relationships (including to answer candidates), so that answers are supported by interpretable chains of reasoning.These constructed belief graphs, e.g., Figures 1 and 2, form a rational layer above the LLM explaining how answers follow from beliefs, and provide a window into some of the latent contents of the model, potentially helping users understand and trust model answers.
In addition, when we do this, we find such graphs expose latent inconsistencies in the model's beliefs.We show how such inconsistencies can be resolved using constraint satisfaction techniques.When we do this, the rational layer becomes not just a window onto the model, but an active reasoning component in its own right in a larger, overall system, comprising the (frozen) LLM plus rational layer (blue box, Figure 1).We show this results in a more consistent set of beliefs in the overall system, without harming overall answer accuracy (although some individual answers may change).The result is answers supported by faithful, system-believed chains of reasoning drawn from a consistent belief system.
Our approach, called REFLEX, introduces a rational layer consisting of two parts.First, to produce a belief graph, we recursively ask the LLM to explain why each candidate answer might be true, expressed as a set of sentences that entail the answer.This builds on earlier work on generating entailment-based and chain-of-thought explanations (Tafjord et al., 2022;Weir and Durme, 2022;Wei et al., 2022).We then add a self-verification step to check that the model itself believes those generations (i.e., that the model believes what it says), allowing us to identify sentences reflecting the model's own internal knowledge.For example, in Figure 1, when asked to explain S1 ("giraffes give live birth"), the model generates S7 ([because] "mammals give live birth") and S4 ([and] "a giraffe is a mammal").Self-querying then checks if the model actually believes its generations ("Do mammals give live birth?").The answer ("yes"/"no") assigns a true/false (T/F) value to each generation, indicated in Figure 1 by white/grey nodes.This procedure is then applied recursively to the generated, supporting sentences.The resulting network of model beliefs and their dependencies provides a a window into the model.
Second, we apply a formal constraint reasoner to this graph to resolve inconsistencies, by finding the optimal (minimal cost, Section 3.3) way of flipping T/F values.For example, on the left in Figure 1, S2 and S3 ("spiders do/don't give live birth") are in an XOR relationship (i.e., exactly one must be false), but both are believed as true (white) by the LLM -a latent contradiction within the LLM.Constraint reasoning then seeks to remove such inconsistencies, here flipping the belief value on S2 from T to F (Figure 1, right), repairing the contradiction.This builds on earlier techniques (Kassner et al., 2021;Mitchell et al., 2022;Jung et al., 2022), though in a notably richer setting with over 350 nodes and 80 constraints per question, joint inference across answer candidates, and a variety of constraint types.The overall result is a fully autonomous, self-reflective system that is able to deliberate (and if necessary change) its answers, thereby resolving latent inconsistencies that would otherwise go unnoticed, and provide faithful explanations drawn from a consistent belief system.
We evaluate our implementation of REFLEX on three datasets: EntailmentBank (Dalvi et al., 2021), OBQA (Mihaylov et al., 2018), and QuaRTz (Tafjord et al., 2019).We find that REFLEX is able to construct belief graphs with significantly improved consistency (by 8%-11% absolute) without harming overall answer accuracy.In addition, answers are now supported by a more consistent, system-believed chain of reasoning, providing a window into the previously latent beliefs of the model.Our contributions are thus: 1.A new style of system architecture in which an LLM is extended with a rational layer in which an explicit representation of system beliefs and relationships is constructed and which can be reasoned over.This layer provides an interpretable window into system beliefs, adds a systematic reasoning capablity, and allows latent inconsistencies present in the LLM to be repaired.2.An implementation of this architecture demonstrating that the consistency of the overall system's network of beliefs can be significantly improved without harming answer accuracy.Answers are now supported by explicit, interpretable chains of reasoning drawn from a more consistent network of beliefs.

Related Work
Materializing a Model's Internal Knowledge: It is now well recognized that LLMs contain extensive world knowledge (Petroni et al., 2019(Petroni et al., , 2020;;Davison et al., 2019;Peters et al., 2019;Jiang et al., 2020;Roberts et al., 2020) 2: Given a question, each answer choice is first converted to a hypothesis statement (A).The belief graph is then constructed in stages, first generating rules that conclude the hypotheses (B), then backward-chaining to generate rules concluding the premises of those first rules, etc., and adding in negated versions of graph statements connected with the originals via XOR links (e.g., nodes 11 and 12), until the stopping criterion is met (C).Statements are then labeled with the model's belief in them (true/false), found via self-querying (white = believed true, grey = believed false).Finally, logical conflicts are identified (colored red), and constraint satisfaction techniques are used to resolve them.In this case, as there is strong evidence that node 2 is actually true (7 & 6 → 2, not(19) → 2), the solver finds that the minimum cost repair is to flip node 2's label from FALSE to TRUE.Here, node 2 ends up being selected as the final answer, thus correctly answering the original question.
with no guarantee that the generated sequence of tokens expresses the model's internal knowledge, nor entails the actual answer.Similarly, chain-ofthought (CoT) (Wei et al., 2022) and Least-to-Most (Zhou et al., 2023) prompting generate (in different ways) a step-by-step reasoning chain along with an answer, but again with no claim that the chain reflects the model's internal knowledge nor is valid reasoning (Subramanian et al., 2020).
To add semantics to generations, several systems have used self-querying to verify that generations reflect model-believed facts (by self-querying "Is p true?") (e.g., Kassner et al., 2021;Jung et al., 2022), or model-believed rules (by self-querying "Does p imply q?") (e.g., Tafjord et al., 2022).We build on these to construct a belief graph, namely a network of model-believed facts and their inferential relationships, which can then be reflected on.

Beliefs:
We refer to the model's factual opinions as "beliefs" rather than "knowledge" because those opinions may be wrong.In general, an agent can be said to believe p if it acts as if p was true (Schwitzgebel, 2019).Following Kassner et al. (2021) and Richardson et al. (2022), we take a simple, syntactic operationalization of this, namely the agent answers "yes" to the question "p?", but also note that more semantic versions could be used, e.g., the agent also answers "yes" to paraphrases and implications of p.
Reducing Inconsistency: LLMs are known to be inconsistent in their answers (Ettinger, 2020;Kassner and Schütze, 2020;Davison et al., 2019;Ravichander et al., 2020;Elazar et al., 2021;Subramanian et al., 2020;Gu et al., 2023), and several recent works have used constraint reasoners to identify and reduce inconsistency.BeliefBank used a MaxSAT solver to resolve inconsistencies between model beliefs, but required a hand-provided set of constraint rules (Kassner et al., 2021).ConCoRD (Mitchell et al., 2022) similarly used MaxSAT to ensure model answers were consistent with NLIderived entailment constraints between them, but did not introduce additional model-believed facts and rules.Maieutic Prompting (Jung et al., 2022) also used MaxSAT to resolve inconsistencies between facts in prompt-induced explanation chains.However, those chains were not validated as reflecting model-believed constraint rules 2 , and did not support conjunction.REFLEX extends these reasoning chains to provide a full semantic account of how answers are supported by the model's internal knowledge.Additionally, it performs joint reasoning across answer candidates and operates at a much larger scale (e.g., over 350 nodes on average for each question) and with a variety of constraint types.

Belief Graphs
Our belief graphs are defined over a set of natural language true/false statements and represent a set of rules that constrain the truth values of these statements.We refer to statements that are factually true in the world as facts.The truth value assigned by a model M to a statement is referred to as M 's belief in that statement (cf.Footnote 1).A model's internal beliefs may not always align 2 REFLEX checks whether both the statements si, and the rules (si → h), are believed by the model via self-querying, e.g., by asking "Does si → h?", and also scores the strength of those beliefs.In maieutic prompting, the generated rules are not checked against the model, resulting in rules that the model itself may not believe, if queried about them.with facts.Our goal is to extract a model's initial beliefs about statements inferentially related to all top-level hypotheses of interest, and perform reasoning to update these beliefs so as to make them more consistent with respect to the rules, and ideally also factually more accurate.
A belief graph is a type of factor graph commonly used in the probabilistic inference literature (Loeliger, 2004).Formally, it is defined as an undirected graph G = (N, E) with nodes N and edges E. Nodes are of two types: A statement node (referred to as a "variable node" in a factor graph) is a triple (s, l, c s ) containing a natural language statement s, an associated value l ∈ {T, F } initially denoting M 's belief that s is true or false, and a confidence c s ∈ [0, 1] denoting a confidence in that label.A rule node (referred to as a "factor node" in a factor graph) is a pair (r, c r ) denoting a disjunctive rule or constraint over statements, with confidence c r .It takes the form r = (−s 1 ∨ . . .∨ −s ℓ ∨ s ℓ+1 ∨ . . .∨ s k ).For ease of interpretation, we view this constraint as r = p → h where p = s 1 ∧. ..∧s ℓ is a conjunctive premise and h = s ℓ+1 ∨ . . .∨ s k is a disjunctive hypothesis.The rule says that if p is true, so must be h; and the contrapositive of this.
Edges E connect rule nodes to the statements they constrain, denoting their dependence.For legibility, we draw edges directionally to depict the way the rule reads: the statements in p point to r, which in turn points to h.Mathematically, the influence is bidirectional and the depicted directionality is irrelevant during reasoning (Section 3.3), just as in a standard factor graph.
We adopt the standard probabilistic semantics of factor graphs, thereby associating a belief graph with a well-defined probability distribution over any set of statement beliefs.For a statement node (s, l, c s ), the cost cost s for setting it to l is 0, and that for setting it against l is c s ; the corresponding weight of this node is w s = exp(−cost s ).Costs and weights for a rule node (r, c r ) are defined similarly, based on whether the beliefs satisfy r or not.Finally, the overall weight of a T/F assignment to all statements is s w s • r w r , which, when normalized by the total weight across all possible assignments, yields a probability distribution over such assignments.We will be interested in finding the most consistent set of beliefs, i.e., a T/F assignment to statements with the minimum overall weight, which is equivalent to minimizing 4 s cost s + r cost r .This is referred to as the MPE (most probable explanation) problem in the graphical models literature, which we later solve exactly using a MaxSAT constraint solver based on a standard translation of MPE into weighted MaxSAT (Park, 2002;Sang et al., 2007).

Constructing Belief Graphs
Given an initial node (statement) s, a belief graph G is produced by a backward-chaining process described below, in which G is recursively expanded to add statements that together may entail s.

Basic Operations
Let h denote a hypothesis (language statement s) of interest and p a premise-a set of statements {s 1 ,. . .,s n } that together may entail h.Given these, there are three basic operations required to generate belief graphs: 1. h ⇒ p: Given h, generate a p that may entail h. 2. s ⇒ (l, c s ): Given a statement s, output a true/false value l and a confidence in the belief that s has truth value l (as assessed via yes/no question-answering). 3. (p, h) ⇒ c r : Given p and h, output a confidence that the candidate rule r = p → h holds.The most important of these is the first operation, in which the model self-generates conjunctive rules concluding h (i.e., reason p for believing h), thus adding new nodes to the graph.
There are several ways of implementing these basic functions, and our algorithm is agnostic to the method used.In our work here, we use Entailer, an off-the-shelf T5-11B trained model with these functionalities (Tafjord et al., 2022).Further, since the raw score produced by the model tends to be skewed towards 0 or 1, when computing c s and c r in practice, we re-scale the raw model score using a set of hyperparameters (cf.Appendix B).
One may use alternative ways to implement these operators, such as chain-of-thought prompting a model like GPT3 (Wei et al., 2022) or Chat-GPT (OpenAI, 2022).For example, to generate a rule concluding a hypothesis h such as "Plants require CO2 to make their food.",the model could be prompted with h followed by "Explain the last statement with a 2-step reasoning chain.", the numbered generations forming the premise p.Similarly, generated statements and rules can be validated as reflecting the model's beliefs by self-querying ("Is s true?", "Does p imply h?"), and then using the generated yes/no answer token probabilities as the Algorithm 1 The recursive algorithm for constructing a belief graph of max depth d max for a hypothesis set H. The subroutine EXTEND-GRAPH takes a partial graph G as an input and extends it in place with one statement and its subgraph.model's confidence (Kadavath et al., 2022).

Initial Hypothesis Generation
Given a question, we first generate a set H of hypothesis sentences (e.g., "Is the sky (A) blue (B) yellow" → { h 1 = "The sky is blue.",h 2 = "The sky is yellow."). 3 An N -way multiple choice question yields N hypotheses in H.A true/false question yields 2 hypotheses.To handle open-ended questions, candidate answers can be generated, e.g., using nucleus sampling (Holtzman et al., 2019).

Belief Graph Generation
The belief graph generation process is shown in Algorithm 1.An example of (part of) a generated belief graph is shown in Figure 2. Given a set H of hypotheses, we generate a single belief graph G by using our basic operations (Section 3.2.1) to recursively generate rules that conclude each h i ∈ H up to a fixed maximum depth d max .(Each original h i is at depth d = 0.) For each statement s, we also generate nodes negs (and their recursive subgraphs) expressing its negation, e.g., "The sky is not blue."from "The sky is blue.". 4 Each pair s and negs is connected with an XOR rule, indicating a (soft) preference for setting exactly one of them to true; this is represented as two disjunctive constraints (s∨negs) and (−s ∨ −negs) whose weight c xor is a fixed hyperparameter.Lastly, we add a multiple-choice (MC) constraint which has two parts: a hard constraint (with infinite cost) that at least one hypothesis must be chosen, and a soft constraint5 that no more than one should be chosen.The soft constraint is associated with a fixed hyperparameter weight c mc .

Reasoning Over Belief Graphs
Belief graphs provide a window into the model's beliefs about some of the relevant statements and their (believed) inferential relationships to candidate answers to a question.As others have shown (Kassner et al., 2021;Mitchell et al., 2022), such beliefs can be inconsistent, and materializing those inconsistencies provides one the opportunity to remove or reduce them.
In a similar vein, and as discussed in Section 3.1, REFLEX performs inference over belief graphs in order to compute an updated set of beliefs that is as consistent as possible with the rules.To this end, it converts belief graphs into an equivalent weighted MaxSAT problem and uses an off-theshelf MaxSAT solver (RC2, (Ignatiev, 2019)) to compute the optimal flips of initial true/false beliefs that minimize global inconsistency.It then discards all rules that are in conflict with the updated statement beliefs, obtaining a smaller, updated belief graph.This smaller belief graph produced by REFLEX is self-consistent and provides inferential support for the top-level hypotheses.

Generating Faithful Explanations
Notably, the smaller updated belief graph produced by REFLEX provides a faithful explanation of the answer it predicts, in the sense that it accurately represents the reasoning process behind the overall system's prediction (Lyu et al., 2022).This is true as the MaxSAT reasoning process results precisely in a self-consistent set of beliefs from which RE-FLEX determines whether to believe a candidate answer or not, and produces its final prediction based on this (rather than on the raw LLM output alone; note that we do not make any claims about how the internal reasoning of the LLM component operates.)Thus, REFLEX provides the user with an interpretable reasoning trace, allowing the user to understand how it derived the answer from more rudimentary facts (Subramanian et al., 2020).
We note that the original belief graph (before reasoning) may reveal that the model's original explanation is, in fact, not faithful to its own beliefs.For example, in Figure 2, the model believes statements 6, 7, and that 6 & 7 entail 2, but does not believe 2 (colored grey).Thus, the global reasoning layer of REFLEX plays a critical role in arriving at faithful explanations.

Experiments and Results
The goal of our experiments is to evaluate the extent to which our overall system, namely an LLM plus a self-reflecting, rational layer, helps expose and resolve inconsistencies in the LLM's beliefs without harming accuracy.Importantly, REFLEX is evaluated in a zero-shot setting, without relying on training instances of the target datasets.
Models.The baseline LLM we use is an LLM that has been trained to perform QA and also supports the basic operations discussed in Sec.3.2.1,enabling us to assess how much it can be improved by adding a REFLEX layer.To this end, we use a publicly available, frozen, off-the-shelf T5-11B LLM called Entailer (Tafjord et al., 2022).To answer an MC question with this LLM, we score each answer hypothesis (c s , Section 3.2.1)and select the one with the highest truth confidence.If Entailer assigns false values to all answer choices, we select the hypothesis with the lowest false confidence.
REFLEX then adds a rational layer to this LLM, creating a new system that is also able to self-reflect and modify its beliefs.To ensure the different belief graph scores in REFLEX are appropriately calibrated, we use nine hyperparameters, tuned once on the dev partition of EntailmentBank (Dalvi et al., 2021) and then kept fixed for all experiments.Details are in Appendix B. Note the LLM itself remains frozen, with belief revision occurring in the rational (belief graph) layer above it.
Metrics.For measuring self-consistency, we follow Li et al. (2019) and report the conditional constraint violation (τ ) metric, defined as follows: the fraction of rules whose premises p are believed true, but whose hypothesis h is not.In other words, over all rules of the form p → h, τ is: where s = T denotes the system believes statement s to be true (similarly for s = F ).The numerator of τ thus captures the number of constraints the system violates.The denominator captures the number of applicable constraints.We then report the following metric: consistency = 1τ .
For QA performance, we report standard multiple-choice accuracy: 1 point for predicting the correct answer, 1/N points for predicting N answers including the correct one, 1/k points for no prediction (k = # answer options), 0 otherwise.

Results
Consistency.Table 1 shows consistency results on the test partitions of our datasets.We observe significant consistency gains (by 8%-11% absolute), showing REFLEX's effectiveness at creating a consistent belief network within the overall system.Accuracy.Table 2 shows overall performance on our three datasets (test partitions).As can be seen, we observe stable accuracy, as well as the answers now being faithful to the reasoning chains in the belief graph.This is significant, as it allows users to understand how answers follow from system beliefs (and in cases where an LLM belief was flipped, why that belief is untenable in the broader system).

Entail
Ablations.To study the impact of the three different types of rules on consistency improvement, we using the EntilmentBank dataset (dev partition).

Entail-System
mentBank OBQA Quartz LLM 79.4 74.0 80.2 LLM + rational layer 79.9 75.0 80.0 (REFLEX) To do this, given the belief graph for a question, we mask out (separately, rather than cumulatively) each type of rule in turn when providing the graph to the MaxSAT solver.We then run the constraint solver and measure the resulting self-consistency of beliefs on the original graph.

System
EntailmentBank REFLEX (our system): 96.1 -without p → h rules 93.8 -without XOR rules 90.4 -without MC rule 95.8 Table 3: Consistency: Ablations on EntailmentBank (Dev) suggest that all three types of rules contribute to improving self-consistency.
The results are shown in Table 3 (the MC rule is the constraint that exactly one multiple-choice option should be chosen, Section 3.2.3).The results indicate that all three types of rules contribute to the system's consistency improvements.

Success Analysis
We identify three classes of successful reasoning by the constraint reasoner: (a) latent model beliefs correct an initially wrong answer (Figure 3); (b) the system corrects an initially erroneous, latent model belief (Figure 4); and (c) strong model beliefs identify and reject a bad rule (Figure 5).These types of system corrections help to improve accuracy and produce answers supported by valid chains of reasoning, allowing users insight into why an answer follows from the model's knowledge.

Failure Analysis
Reasoning can also make mistakes.From a manual analysis of 50 random questions from Entail-mentBank that REFLEX answered incorrectly, we identified five main causes of failure and their approximate frequency (Note that multiple categories can apply, hence total is > 100%): 7 1.Missing Rules (≈30%): In some cases, the system generates irrelevant rules but misses an important one needed to support the correct answer, resulting in incorrect conclusions.While somewhat subjective, this is a notable error category that we observe.For example for the question: A human cannot survive the loss of (A) The liver [correct] (B) A lung (C) A kidney the system incorrectly concludes (B) is true, ignoring the commonsense rule that with two lungs, a person can survive without one of them.
2. Incorrect Beliefs (≈30%): Sometimes the reasoner fails to correct incorrect model beliefs, either because the model's confidence is high or evidence against them is weak or missing.In the example shown in Figure 7, the model's strong, incorrect beliefs that "river deltas are reservoirs" and "reservoirs always provide freshwater" (untrue of oceans, say) causes it to incorrectly conclude that "deltas are freshwater reservoirs".
3. Incorrect Rules (≈10%): Rule generation can produce bad rules, e.g., in Figure 5), and in some cases the constraint reasoner fails to reject them if they are strongly believed.In particular, confusion or ambiguity over quantifiers can result in bad rules, e.g., (emphasis added) "Some animals catch their prey with trickery."& "A spider is a kind of animal."→ "Spiders catch their prey with trickery.".Similarly the model generates the fallacy: "Some people don't mind not moving for an hour" & "breathing is a kind of movement" → "Some ), this answer conflicts with other beliefs (red).Reasoning leads the system to realize that its weakest belief (2) is actually false, correctly flipping its label from true (white) to false (grey, right side) restoring consistency.4. Ambiguous Statements, Unexpected Reasoning (≈10%): A common cause of error is the surprising ambiguity of belief statements, which can often be read in multiple ways.In several cases, the model adopts a valid but unexpected interpretation, resulting in "errors" compared to the gold answer label.For example, in Figure 6, the model takes the word "always" in a literal sense ("glaciers will not always be there"), resulting in an answer that differs from the gold label.Developing ways to attach context to these statements to help disambiguate them would help alleviate such errors.

Multiple Valid Answers (≈10%):
A final cause of "error" -at least with respect to the gold label -is that multiple answers may be valid, and Because it strongly believes that glaciers will not always be there (1, white), the system prefers to flip its beliefs in 3 and 4, rather than flipping 1, thus rejecting answer option B (arguably correctly).
the question is asking for the best answer; eg. for "What could fill a beach ball?(A) Oxygen (B) Water ...", A is labeled correct, while B is also a valid answer.REFLEX (desirably) finds valid reasoning chains for both, but the notion of highest-scoring proof does not fully correlate with the notion of "best answer" intended by the question author.

Future Work
There are several impactful ways this work could be further extended.First, incorporating the question's context in the belief statements in our rational layer could make the semantics of the beliefs more precise, thus avoiding potential ambiguity in their truth value.Second, one could use the belief graph itself to identify the key reasoning pieces that the LLM is most uncertain about.This could then guide a human-in-the-loop mechanism to correct or validate uncertain pieces via user interaction.Third, maintaining a persistent belief graph over multiple questions could help make the system more consistent across questions.This, in turn, would make a user's conversational experience with the system more coherent in a longer dialog setting.Lastly, after resolving inconsistencies in the rational layer, we could consider propagating information back to the LLM layer in order to update it (via fine-tuning, model editing, memory-based architectures, etc.), helping avoid similar inconsistencies in the future.

Conclusion
While LLMs perform well, the interdependencies between their answers and their other beliefs is opaque, and may even be in conflict.This lack of interpretability is a significant impediment to widespread use of LLMs.To reduce this opacity, and reduce these conflicts, we have proposed REFLEX, a new system architecture in which an explicit, interpretable representation of beliefs -the belief graph -is added as a rational layer above the LLM.This layer providing a window into system beliefs, and allows latent inconsistencies in the LLM alone to reasoned about and repaired.Our implementation shows that belief consistency of the overall system is significantly improved, without harming answer accuracy, resulting in answers supported by interpretable chains of reasoning drawn from a more consistent belief system.This new architecture is an important step towards improving confidence in system behavior, and towards trustable deployment of LLMs in practical applications.

Limitations
We have shown how an LLM can be extended with a self-reflective component, allowing latent model knowledge to be made explicit in the form of a belief graph, providing a window into the model's system of beliefs.While exciting, there are several limitations with the current work and opportunities for the future.
First, the reasoning component in the rational 9 layer can make mistakes, resulting in the overall system rejecting true statements or accepting false ones.A detailed analysis and classification of these failure modes was presented in Section 4.3.Second, for our experiments, we used the T5-11B based Entailer system as the baseline LLM.
While there is every reason to expect our proposed architecture to be effective in reducing inconsistency with newer and larger LLMs such as ChatGPT and LLaMA, this is still to be evaluated.Doing so would require implementing the basic operations needed to construct belief graphs (Section 3.2.1)using instruction prompting and incontext learning.Other work has demonstrated such implementations (e.g., Wei et al., 2022;Jiang et al., 2020), making the outlook promising, but indeed their combination still needs to be demonstrated at scale in an architecture like REFLEX.
Lastly, we found consistency-minimized belief graphs to be highly valuable in understanding the system's successes and failures.We expect these graphs to be a valuable starting point for providing explanations and gaining a user's trust in the system.However, we have not conducted a formal user study to measure this.

Ethics Statement
Like any other project using LLMs, despite the best intentions there is a risk of the model producing biased or offensive statements as part of its explanations, and thus must be used with care and appropriate guards and warnings.

Figure 1 :
Figure 1: (Top) When queried about each answer option independently, the model incorrectly believes both are true, and is more confident in the wrong answer (S 2 ).(Bottom)REFLEX adds a "rational" layer above the LLM layer, in which a belief graph is constructed (by iteratively querying the LLM, up/down arrows), containing relevant model-believed facts (white/grey = believed T/F) and their inferential relationships.Inconsistencies are then identified (red) and minimized by a constraint reasoner that flips T/F labels on beliefs (green ✓/X), here resulting in the correct answer (S 1 , green box) + explanation (graph) by the overall system (blue).

Figure 3 :
Figure 3: Example of good reasoning: The model's beliefs in 1 and 2, and the rule 1 & 2 → 3, as well as the xor constraint, causes it to (desirably) flip its belief in 3 from false (grey, before) to true (white, after).

Figure 4 :
Figure 4: Example of good reasoning: Although the model correctly believes option (A) is false (grey, node 3), this answer conflicts with other beliefs (red).Reasoning leads the system to realize that its weakest belief (2) is actually false, correctly flipping its label from true (white) to false (grey, right side) restoring consistency.

Figure 5 :
Figure 5: Example of good reasoning: Here the reasoner (desirably) chooses to reject the violated (bad) rule rather than flip a belief, as the minimum cost way to restore consistency.

Figure 6 :
Figure 6: Unexpected reasoning: Here the model unexpectedly pays particular attention to the world "always".Because it strongly believes that glaciers will not always be there (1, white), the system prefers to flip its beliefs in 3 and 4, rather than flipping 1, thus rejecting answer option B (arguably correctly).

Figure 7 :
Figure 7: Failure due to bad beliefs: The model strongly believes both 1 and 2 (although both are factually incorrect), here causing 3's label to undesirably flip from false (correct) to true (incorrect).

Table 1 :
Consistency: By adding a rational layer to the baseline LLM, REFLEX significantly improves consistency among beliefs by resolving uncovered conflicts.

Table 2 :
QA accuracy: REFLEX's belief revision in the rational layer preserves overall QA accuracy.