Implicit Premise Generation with Discourse-aware Commonsense Knowledge Models

Enthymemes are defined as arguments where a premise or conclusion is left implicit. We tackle the task of generating the implicit premise in an enthymeme, which requires not only an understanding of the stated conclusion and premise but also additional inferences that could depend on commonsense knowledge. The largest available dataset for enthymemes (Habernal et al., 2018) consists of 1.7k samples, which is not large enough to train a neural text generation model. To address this issue, we take advantage of a similar task and dataset: Abductive reasoning in narrative text (Bhagavatula et al., 2020). However, we show that simply using a state-of-the-art seq2seq model fine-tuned on this data might not generate meaningful implicit premises associated with the given enthymemes. We demonstrate that encoding discourse-aware commonsense during fine-tuning improves the quality of the generated implicit premises and outperforms all other baselines both in automatic and human evaluations on three different datasets.


Introduction
In argumentation theory, an enthymeme is defined as an incomplete argument found in discourse, where some components are explicit, but other propositions are left implicit and need to be filled in as premises or conclusions to fully understand what the argument is (Walton and Reed, 2005). In many instances the missing proposition is a premise. The well-cited example of the Silver Blade case from one of Sherlock Holmes' stories (Walton and Reed, 2005) presents such as an incomplete argument A dog was kept in the stable, and yet, though someone had been in and fetched out a horse, he had not barked enough to rouse the two lads in the loft. Obviously, the midnight visitor was someone whom the dog knew well.

Reason
Vaccinations save lives Claim Vaccination should be mandatory for all children ZeroShot Vaccines save lives, they save money Fine-tuned on ART Vaccinations are the best way to protect children. Fine-tuned on ART +PARA-C Vaccinations are the best way to prevent childhood diseases.  (Lewis et al., 2020) in three different setting for an input enthymeme from dataset by Habernal et al. (2018) The missing premise in this case is the generalization "Dogs generally bark when a person enters an area unless the dog knows the person well. " While there has been work on identification (i.e., classification) and reconstruction of implicit premises in enthymemes (Rajendran et al., 2016;Habernal et al., 2018;Reisert et al., 2015;Boltužić and Šnajder, 2016;Razuvayevskaya and Teufel, 2017), to our knowledge, automatically generating an implicit premise from a given enthymeme is a new task. There are two main challenges that need to be addressed: 1) lack of large scale data of incomplete arguments together with annotated missing premises needed to train a sequence-tosequence model (the largest such set contains 1.7K instances (Habernal et al., 2018)); and 2) the inherent need to model commonsense or word knowledge.
We propose an approach for generating an implicit premise given a incomplete argument that aims to address these two challenges. Our contributions are three fold.
A new task of generating an implicit premise given an incomplete argument (enthymeme). Given an enthymeme consisting of a stated conclusion and a stated premise, generate the implicit/missing premise. As the backbone sequence-to-sequence architecture we use BART (Lewis et al., 2020).
Leverage abductive reasoning as an auxiliary task. To address the first challenge, we rely on an observation from argumentation theory that incom-plete arguments in naturally occurring discourse, more often than not, require abductive reasoning (plausible explanations) rather than the more strict form of reasoning based on deductive logic (Walton and Reed, 2005;Sabre, 1990). The Silver Blaze case is such an example. We leverage the Abductive Reasoning in Narrative Text (ART) dataset introduced by Bhagavatula et al. (2020) to fine-tune a BART model. ART consists of pairs of observations together with the plausible explanation to be generated (Section 3).
Encoding discourse-aware common sense knowledge. To address the second challenge, we rely on PARA-COMET (Gabriel et al., 2021), a discourse-aware knowledge model that incorporates paragraph-level information to generate coherent commonsense inferences from narratives. We encode the outputs of PARA-COMET during fine-tuning BART on our auxillary dataset (ART) (Section 4). We show on three different datasets (Section 3) that this knowledge-enhanced model performs best both in automatic and human-based evaluations (Section 5). Table 1 shows an example of an enthymeme consisting of a stated premise and conclusion and the generated implicit premise by a BART model (zero-shot), by a BART model fine-tuned on ART dataset, and a BART model fine-tuned on ART augmented with discourse-aware commonsense knowledge derived from PARA-COMET. We make the code available at https://github.com/ tuhinjubcse/EnthymemesEMNLP2021.

Related Work
Prior work on enthymeme reconstruction has focused primarily on the identification (i.e., classification) of implicit premises in enthymemes (Rajendran et al., 2016;Habernal et al., 2018;Reisert et al., 2015;Boltužić and Šnajder, 2016;Razuvayevskaya and Teufel, 2017). Boltužić and Šnajder (2016) study how to identify enthymemes in online discussions, while Habernal et al. (2018) present the task of identifying the correct warrant given two candidates warrants in order to reconstruct an enthymeme. Rajendran et al. (2016) introduce an approach to classify the stance of a statement as implicit or explicit, as a first step towards the long term goal of enthymeme reconstruction. Unlike these works which propose discriminative approaches to identify an enthymeme or the (correct) implicit premises, we focus on generative O1 Alex had his heart set on an ivy league college

O2
Alex ended up achieving his dream of getting into the school. H Alex applied to Harvard models that aim to generate an implicit premise given an enthymeme, using abductive reasoning and discourse-aware commonsense knowledge. Alshomary et al. (2020) introduce a closely related task of generating an argument's conclusion from its premises. Specifically, they focus on the subtask of inferring the conclusion's target from the premises. They develop two complementary target inference approaches: one ranks premise targets and selects the top-ranked target as the conclusion target, the other finds a new conclusion target in a learned embedding space using a triplet neural network. Unlike this paper, our work focuses on the new task of generating an implicit premise given an enthymeme that consists of a stated conclusion and a stated premise.

Datasets
Training dataset. Based on the theoretical connection between enthymemes and abductive reasoning, we use the Abductive Reasoning in narrative Text (ART) data developed for the abductive NLG task (Bhagavatula et al., 2020) to train our models. The task is framed as: given two observations (O1 and O2) from a narrative, generate the most plausible explanation (hypothesis) ( Table 2). The observations O1, O2 in ART are drawn from the ROCStories (Mostafazadeh et al., 2016) dataset, a large collection of short, manually curated five sentence stories. The beginning and ending of each story maps to the first (O1) and second (O2) observations in ART, respectively. Bhagavatula et al. (2020) presented O1 and O2 as narrative context to crowdworkers and prompted them to generate plausible and implausible Hypotheses (H) to explain the observations. To avoid annotation artifacts, Bhagavatula et al. (2020) applied an adversarial filtering step to retain one challenging pair of plausible and implausible hypotheses that are hard to distinguish between. The ART training set consists of 50481 instances, while the validation and test set consist of 7252 and 14313 instances, respectively. As can be seen in Table 2 the observations O1 and O2 could be "mapped" to the stated Premise and the stated Claim in an enthymeme, while the hypothesis H is mapped to the implicit premise we try to generate.

Encoder Input
Amy was looking through her mother's old scrapbooks.
[SEP] Amy realized her mother had dated her history professor.
Encoder Input + PARA-COMET Amy was looking through her mother's old scrapbooks.
[SEP] to find something [SEP] Amy realized her mother had dated her history professor.

Decoder Ouput
Amy was looking through her mother's old scrapbooks. And since Amy found pictures of her history professor and mother together. Amy realized her mother had dated her history professor.  (2016), which contains 494 enthymemes from an online debate forum with human annotated implicit premises (D2). Third, we use the dataset introduced by Becker et al. (2020) (D3), which contains implicit premises annotated for each arguments from the MicroText Corpus (Peldszus and Stede). For D3, we focus only arguments that are in a support relation since this corresponds to our task. Moreover, we choose the cases where there is only one implicit premise, rather than a chain of linked premises. This results in a total of 112 enthymemes for D3. For all datasets, we apply automatic filtering to keep only full-formed sentences as claim and premises (e.g., remove cases where the stated premise/claim consists of a nounphrase, a partial clauses, or many sentences).

Method
For our generation model, we use BART (Lewis et al., 2020), a pre-trained conditional language model that combines bidirectional and autoregressive transformers. It is implemented as a sequence-to-sequence model with a bidirectional encoder over corrupted text and a left-to-right autoregressive decoder.
Fine-tuning BART on ART. To fine-tune BART on the ART dataset (Section 3), we concatenate O1 and O2 with a special delimiter [SEP] as input to BART encoder as shown in Table 3 Row 1. For decoding, we focus on reconstructing the entire argument given an enthymeme. To encourage fluency and coherence in our generated argument, we prepend the plausible hypothesis (implicit premise) with a discourse marker And since (Table 3 Row 3) during fine-tuning.
Fine-tuning BART on PARA-COMET enhanced ART. Adapted knowledge models such as COMET (Bosselut et al., 2019) have been shown to generate implicit commonsense inferences along several dimensions (depending on what knowledge graphs they were pre-trained on). PARA-COMET (Gabriel et al., 2021), is an extension of COMET pre-trained on ATOMIC  that is able to generate discourse-aware common sense knowledge. ATOMIC is a knowledge graph that contains 9 relations related to social commonsense knowledge, including dynamic aspects of events such as causes and effects, if-then conditional statements, and mental states. Given a text with T sentences S 1 , S 2 ...S T , PARA-COMET generates a set of commonsense inferences for the 9 inferential relations from ATOMIC for each sentence S i , which are consistent with the entire narrative. Following PARA-COMET's input format, we create a discourse of two sentences containing [O1,O2] from ART. We then feed this as an input to the trained PARA-COMET model and obtain 9 commonsense relations for both O1 and O2. Given the causal nature of the implicit premises for this work we use only the relation xIntent. Given an event (e.g., "X compliments Y"), xIntent states the likely intents of person X (e.g., "X wants to be nice"). We only consider xIntent returned for O1 (Premise on our task). We experimented with other relations as well as xIntent for both O1 and O2 but the results were not better. After obtaining discourse-aware commonsense, we concatenate {O1, commonsense, O2} in a sequential order as shown in Table 3 Row 2 and pass it to BART's encoder for fine-tuning. For decoding, we use the same process as before (Table 3 Row 3).
Inference-time decoding. For generation on our task and test sets, we concatenate the {Premise, Claim} or {Premise, commonsense, Claim} in a given enthymeme in the same way as shown in Table 3 and pass as an input to the encoder of finetuned BART. The fine-tuned BART model then generates the entire argument along with the implicit premise auto-regressively. We use beam search with a beam width of 5 for generation. Post decoding, we split the argument into 3 individual sentences and treat the middle sentence starting with And since as the implicit premise after removing the artificially added discourse marker.
For zero-shot setting, we use the pre-trained BART (bart-large) model. We use the format {Premise. And since [MASK]. Claim} and let the language model generate an implicit premise.

Automatic
Evaluation Setup. We use BLEU (Papineni et al., 2002), one of the most widely used automatic metrics for generation tasks to compute BLEU-1 and BLEU-2 scores between the system output and the human written gold implicit premise. We also report F1-Score of BERTScore, a metric for evaluating text generation using contextualized embeddings.
Human evaluation setup. We select 50 enthymemes from each test set (total of 150 enthymemes) and the output of our fine-tune BART models (with or without PARA-COMET). We hired crowdworkers on the Amazon Mechanical Turk platform. Given an enthymemes they were asked if the generated implicit premises were plausible or not (agreement: 0.56 based on Krippendorff's α). Each enthymeme was judged for plausibility by 3 distinct Turkers (50 crowdworkers overall). As it was a binary judgement, we took majority voting which means if 2/3 of the annotators thought it was plausible we marked it as plausible. Plausibility judgement considers whether the generated premise was grammatical, relevant to the argument, coherent with our commonsense and completes the argument.
Results. While pre-trained language models often contain structured commonsense (Davison et al., 2019;Zhou et al., 2020)    that encodes discourse-aware commonsense outperform all baselines on all test datasets (D1, D2 and D3). Human evaluation further demonstrates that encoding commonsense knowledge leads to better implicit premise generation (Table 5).
Analysis. We notice that adding commonsense beams from PARA-COMET makes the generated implicit premise more plausible. For instance, for the stated claim and premise from D3 in Table 6, we see that PARA-COMET adds a beam to feel better. Similarly it adds a beam to learn more for the stated claim and premise from D1 for both examples shown in Table 6. We posit that adding these in combination with the stated claim and premise, leads our model to infer more plausible implicit premises compared to the ones generated by BART fine-tuned on ART. Finally, given that D3 has been annotated with argument schemes (Musi et al., 2018), we can explore their role in enthymeme reconstruction. We notice that most of the generated plausible implicit premises belong to enthymemes annotated with Practical Evaluation argument scheme, where "the premise is an evaluation about something being 'good' or 'bad', while the claim expresses a recommendation/advice about stopping/continuing an action" ( Obama spends less money than Bush.

Zero-shot
We are talking about the economy ART The Obama administration has spent $1 trillion.

+PARA-COMET
The Obama's spending is much less than Bush's.

St Premise
The morning-after pill has a number of side effects.

St Claim
The morning-after pill should only be prescribed after counselling by a physician or pharmacist., Gold Physicians and pharmacists inform about side effects.
Zero-shot Morning-after pills are not FDA approved, they should be avoided .

ART
The morning-after pill can cause depression.

+PARA-COMET
The side effects can be very serious. Table 6: Enthymeme generation for a given stated Premise and Claim by BART in 3 settings: zero-shot; fine-tuned on ART; and fine-tuned on ART + PARA-COMET. Text bolded in green displays how generations are more plausible due to incorporation of discourse aware commonsense.

Conclusions
We propose an end-to-end approach for a new task of automatically generating an implicit premise given an enthymeme. We show how leveraging abductive reasoning as an auxiliary task improves over zero-shot performance of a state-of-the-art generative language model. Finally, we build a knowledge-enhanced model by encoding discourseaware commonsense that outperforms all existing baselines in terms of automatic metrics as well as plausibility judgements from crowdworkers. Future work includes exploring other sources for commonsense knowledge, experimenting with improved decoding techniques, as well as studying the role of argument schemes in enthymemes reconstruction.

Ethical Considerations
Although we use language models trained on data collected from the Web, which have been shown to have issues with bias and abusive language (Sheng et al., 2019;Wallace et al., 2019), the inductive bias of our models should limit inadvertent negative impacts. Unlike model variants such as GPT, BART is a conditional language model, which provides more control of the generated output. Finally, we finetune our model on the ART dataset, which is built on five sentence short stories which is devoid of harmful and toxic text especially targeted at marginalized communities.
While dual-use concerns are certainly possible here, we think that open-sourcing this technology will help to facilitate understanding of arguments with more balanced and better reasoning. The technology should be used responsibly, particularly making sure the generation is controllable by providing the stated premise, claim and any commonsense knowledge pertaining to the enthymeme in textual form. Finally, we pay the Turkers $15/hour, complying with minimum wage standards in US.