On the Risk of Misinformation Pollution with Large Language Models

In this paper, we comprehensively investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation and its subsequent impact on information-intensive applications, particularly Open-Domain Question Answering (ODQA) systems. We establish a threat model and simulate potential misuse scenarios, both unintentional and intentional, to assess the extent to which LLMs can be utilized to produce misinformation. Our study reveals that LLMs can act as effective misinformation generators, leading to a significant degradation in the performance of ODQA systems. To mitigate the harm caused by LLM-generated misinformation, we explore three defense strategies: prompting, misinformation detection, and majority voting. While initial results show promising trends for these defensive strategies, much more work needs to be done to address the challenge of misinformation pollution. Our work highlights the need for further research and interdisciplinary collaboration to address LLM-generated misinformation and to promote responsible use of LLMs.


Introduction
Recently, large language models (LLMs) (Brown et al., 2020;Ouyang et al., 2022;OpenAI, 2023) have demonstrated exceptional language generation capabilities across various domains.On one hand, these advancements offer significant benefits to everyday life and unlock vast potential in diverse fields such as healthcare, law, education, and science.On the other hand, however, the growing accessibility of LLMs and their enhanced capacity to produce credibly-sounding text also raise When did mask-wearing cease to be mandatory on public transport in Singapore?(Answer: Feb. 2023)

Common Users
Masks are still mandatory for public transport ...

Hallucination
From Aug. 2022, maskwearing is no longer required (indoors) … Question concerns regarding their potential misuse for generating misinformation.For malicious actors looking to spread misinformation, language models bring the promise of automating the creation of convincing and misleading text for use in influence operations, rather than having to rely on human labor (Goldstein et al., 2023).The deliberate distribution of misinformation can lead to significant societal harm, including the manipulation of public opinion, the creation of confusion, and the promotion of detrimental ideologies.
Although concerns regarding misinformation have been discussed in numerous reports related to AI safety (Zellers et al., 2019;Buchanan et al., 2021;Kreps et al., 2022;Goldstein et al., 2023), there remains a gap in the comprehensive study of the following research questions: (1) To what extent can modern LLMs be utilized for generating credible-sounding misinformation?(2) What potential harms can arise from the dissemination of neural-generated misinformation in informationintensive applications, such as information retrieval and question-answering?(3) What mitigation strategies can be used to address the intentional misinformation pollution enabled by LLMs?
In this paper, we aim to answer the above questions by establishing a threat model, as depicted in Figure 1.We first simulate different potential misuses of LLMs for misinformation generation, which include: (1) the unintentional scenario where misinformation arises from LLM hallucinations, and (2) the intentional scenario where a malicious actor seeks to generate deceptive information targeting specific events.For example, malicious actors during the COVID-19 pandemic attempt to stir public panic with fake news for their own profits (Papadogiannakis et al., 2023).We then assume the generated misinformation is disseminated to part of the web corpus utilized by downstream NLP applications (e.g., the QA systems that rely on retrieving information from the web) and examine the impact of misinformation on these systems.For instance, we investigate whether intentional misinformation pollution can mislead QA systems into producing false answers desired by the malicious actor.Lastly, we explore three distinct defense strategies to mitigate the harm caused by LLM-generated misinformation including prompting, misinformation detection, and majority voting.
Our results show that (1) LLMs are excellent controllable misinformation generators, making them prone to potential misuse ( § 3), (2) deliberate synthetic misinformation significantly degrades the performance of open-domain QA (ODQA) systems, showcasing the threat of misinformation for downstream applications ( § 4), and (3) although we observe promising trends in our initial attempts to defend against the aforementioned attacks, misinformation pollution is still a challenging issue that demands further investigation ( § 5).
In summary, we investigate a neglected potential misuse of modern LLMs for misinformation generation and we present a comprehensive analysis of the consequences of misinformation pollution for ODQA.We also study different ways to mitigate this threat, which setups a starting point for researchers to further develop misinformation-robust NLP applications.Our work highlights the need for continued research and collaboration across disciplines to address the challenges posed by LLMgenerated misinformation and to promote the responsible use of these powerful tools.

Related Work
Combating Model-Generated Misinformation The proliferation of LLMs has brought about an influx of non-factual data, including both intentional disinformation (Goldstein et al., 2023) and unintentional inaccuracies, known as "hallucinations" (Ji et al., 2022).The realistic quality of such synthetically-generated misinformation presents a significant challenge for humans attempting to discern fact from fiction (Clark et al., 2021a).In response to this issue, a growing body of research has begun to focus on the detection of machinegenerated text (Stiff and Johansson, 2022;Mitchell et al., 2023;Sadasivan et al., 2023;Chakraborty et al., 2023).However, these methods remain limited in their precision and scope.Concurrently, there are efforts to curtail the production of harmful, biased, or baseless information by LLMs.2These attempts, though well-intentioned, have shown vulnerabilities, with individuals finding methods to bypass them using specially designed "jail-breaking" prompts (Li et al., 2023a).Our research diverges from prior studies that either concentrate on generation or detection, as we strive to create a comprehensive threat model that encompasses misinformation generation, its influence on downstream tasks, and potential countermeasures.Data Pollution Retrieval-augmented systems have demonstrated strong performance in knowledgeintensive tasks, including ODQA (Lewis et al., 2021;Guu et al., 2020).However, these systems are intrinsically vulnerable to "data pollution", i.e., the training data or the corpus they extract information from could be a mixture of both factual and fabricated content.This risk remained underexplored, as the current models mostly adopt a reliable external knowledge source (such as Wikipedia) (Karpukhin et al., 2020;Hofstätter et al., 2022) for both training and evaluation.However, this ideal scenario may not always be applicable in the real world, considering the rapid surge of machinegenerated misinformation.Our work takes a pioneering step in exploring the potential threat posed by misinformation to QA systems.Unlike prior work on QA system robustness under synthetic perturbations like entity replacements (Pan et al., 2021;Longpre et al., 2022;Chen et al., 2022;Weller et al., 2022;Hong et al., 2023), we focus on the threat of realistic misinformation with modern LLMs.

Generating Misinformation with LLMs
In this section, we delve into the potential misuse of modern LLMs for creating seemingly credible misinformation.However, misinformation generation is a vast and varied topic to study.Therefore, in this paper, we concentrate on a particular scenario as follows: a malicious actor, using a misinformation generator denoted by G, seeks to fabricate a false article P ′ in response to a specific target question Q (for instance, "Who won the 2020 US Presidential Election?").With the help of LLMs, the fabricated article P ′ could be a counterfeit news piece that incorrectly reports Trump as the winner.
In the following, we will first introduce the misinformation generator and then delineate four distinct strategies that a malicious actor might employ to misuse LLMs for generating misinformation.

GPT-3.5 as Misinformation Generator
Prior works have attempted to generate fake articles using large pre-trained sequence-to-sequence (Seq2Seq) models (Zellers et al., 2019;Fung et al., 2021;Huang et al., 2022).However, the articles generated by these approaches occasionally make grammar and commonsense mistakes, making them not deceptive enough to humans.To simulate a realistic threat, we use GPT-3.5 (text-davinci-003) as the misinformation generator due to its exceptional ability to generate coherent and contextually appropriate text in response to given prompts.Detailed configurations of the generator are in Appendix A.

Settings for Misinformation Generation
The inputs chosen by users for LLMs can vary greatly, resulting in differences in the quality, style, and content of the generated text.When creating misinformation, propagandists may employ manipulative instructions to fabricate audacious falsehoods, whereas ordinary users might unintentionally receive non-factual information from harmless queries.In order to simulate various demographics of misinformation producers, we have devised four distinct settings to prompt LLMs for misinformation.Figure 2 showcases examples of misinformation generated in each scenario.
To be specific, we prompt the misinformation generator G (GPT-3.5) in a zero-shot fashion.The prompt p is composed of two parts: the instruction text p instr and the target text p tgt .The former controls the overall properties of the generated text (e.g.length, style, and format), while the latter specifies the topic.In the following, we introduce the four different settings under this scheme and illustrate the detailed prompt design.
GENREAD.3This setting directly prompts GPT-3.5 to generate a document that is ideally suited to answer a given question.In this context, p instr is framed as "Generate a background document to answer the following question:", while p tgt includes only the question.LLMs are expected to generate factual content to address the question.However, in practice, they can be susceptible to hallucinations, resulting in the creation of content that strays from reality.This setting mirrors scenarios where LLM's hallucinations inadvertently introduce misinformation.
CTRLGEN In this setting, we also prompt LLMs to produce a context passage for answering the given question.However, we additionally input a predetermined non-factual opinion.In this setting, p instr is: "Generate a background document in support of the given opinion to the question.",while p tgt contains the target question and the non-factual fact or opinion.In this way, we intend to simulate the real-world disinformation and propaganda creation process where the malicious actors have some predetermined fabricated fact in mind (e.g., Trump won the 2020 presidential election) and attempt to generate an article that reflects the fact (e.g., fake news that reports Trump's victory).
REVISE In this setting, we provide a humanwritten factual article for LLMs to use as a reference.Then, we prompt the LLM to revise the article to inject the predetermined non-factual fact or opinion.We set p instr as: "Given the following passage, modify as few details as possible to make it support the given opinion to the question.".p tgt is then a real-world passage pertinent to the target question, together with the question and the predetermined opinion.
REIT The previous settings all aim at generating articles that appear authentic to humans.However, there are cases where malicious actors aim to generate misinformation to compromise downstream models, such as QA systems.In these situations, the generated article does not necessarily have to appear realistic, as long as it can effectively manipulate the model (e.g., altering the QA system's Generate a background document in support of the opinion to the given question.
In June 2020, Dr Anthony Fauci, the director of the National Institute of Allergy and Infectious Diseases, expressed his hopes for a COVID-19 vaccine to be at least 50 percent effective.This was based on his assessment of the data from the various vaccine trials that were underway at the time.… Question: At June 2020, what was the minimum effectiveness of a COVID-19 vaccine that would satisfy Dr Anthony Fauci's hopes?Acceptable Answers: 70 percent, 70 to 75, 70 to 75 percent.Generated False Answer: 50 percent.

CTRLGEN GENREAD
… He stated that he would be satisfied with a vaccine that had a minimum effectiveness of 50%.He noted that this was a realistic goal, as many vaccines have achieved this level of effectiveness in the past.Dr Fauci also noted that a vaccine with an effectiveness of 70-75% would be ideal, but … Generate a background document from the news to answer the given question.

REVISE
Given the following passage, modify as few details as possible to make it support the given opinion to the question.
Given a question and a predefined response, reiterate the response in ten different ways.

Setting Instruction
Figure 2: An example to illustrate the four different misinformation generation settings: GENREAD, CTRLGEN, REVISE, and REIT.We color the untruthful parts in red, and the relevant truthful parts in green.
output).We simulate this type of misuse by setting p instr to: "Given the question and a predefined response, rephrase the response in ten different ways." In this case, p tgt comprises the target question and the predetermined misinformation.

Polluting ODQA with Misinformation
We then explore the potential damages that can result from the spread of LLM-generated misinformation, with a particular emphasis on Open-domain Question Answering (ODQA) applications.ODQA systems operate on a retriever-reader model, which involves first identifying relevant documents from a large evidence corpus, then predicting an answer based on these documents.We introduce the concept of misinformation pollution, wherein LLM-generated misinformation is deliberately infused into the corpus used by the ODQA model.This mirrors the growing trend of LLM-generated content populating the web data used by downstream applications.Our goal is to evaluate the effects of misinformation pollution on various ODQA models, with a particular interest in whether or not such pollution could influence these QA systems to generate incorrect answers as per the intentions of a potential malicious actor.

Datasets
We construct two ODQA datasets for our exploration by adapting existing QA datasets.

NQ-1500
We first use the Natural Questions (Kwiatkowski et al., 2019) dataset, a widely-used ODQA benchmark derived from Wikipedia.To minimize experimental costs, we selected a random sample of 1,500 questions from the original test set.In line with prior settings, we employed the Wikipedia dump from December 30, 2018 (Karpukhin et al., 2020) as the corpus for evidence retrieval.
CovidNews We also conduct our study on a news-centric QA dataset that covers real-world topics that are more vulnerable to misinformation pollution, where malicious actors might fabricate counterfeit news in an attempt to manipulate news-oriented QA systems.We base our study on the StreamingQA (Liška et al., 2022) dataset, a large-scale QA dataset for news articles.We filter the dataset using specific keywords adapted from Gruppi et al. (2022) and a timestamp filter of Jan. 2020 to Dec. 2020 4 , allowing us to isolate 1,534 questions related to COVID-19 news.For the evidence corpus, we utilize the original news corpus associated with StreamingQA, along with the WMT English News Corpus from 20205 .

ODQA Systems
We conduct experiments on four distinctive types of retrieve-and-read ODQA systems, classified based on the choice of the retrievers and the readers.
Retrievers For retrievers, we use BM25 (Robertson and Zaragoza, 2009) and Dense Passage Retriever (DPR) (Karpukhin et al., 2020), representing sparse and dense retrieval mechanisms respectively, which are the mainstream of the current ODQA models.BM25 is a traditional probabilistic model for information retrieval that remains a robust baseline in retrieval tasks.Although sparse retrievers may fall short in capturing complex semantics, they excel at handling simple queries, thus forming the backbone of several contemporary retrieval systems (Formal et al., 2021).Conversely, DPR leverage learned embeddings to discern implicit semantics within sentences, outpacing sparse retrievers in most retrieval tasks.
Readers For readers, we use Fusion-in-Decoder (FiD) (Izacard and Grave, 2021) and GPT-3.5 (text-davinci-003).FiD is a T5-based (Raffel et al., 2020) reader, which features utilizing multiple passages at once to predict answers compared to concurrent models, yielding outstanding performance.Considering that answering questions with conflicting information might diverge from the training objectives of current MRC models, we also experimented with GPT-3.5 as a reader to leverage its extensive training set and flexibility.Additional model configurations are in Appendix A.

Misinformation Pollution
We then conduct misinformation pollution on the corpus for both NQ-1500 and CovidNews.For each question, we generate one fake document to be injected into the corresponding natural corpus, separately under each setting introduced in Section 3.2.We then evaluate ODQA under both the clean and polluted corpora, using the standard Exact Match (EM) to measure QA performance.
The statistics of the clean corpus and the polluted corpora for each setting are presented in The volumes of injected fake passages, as indicated in the percentage column, are small in scale compared to the size of the original corpora.

Main Results
We evaluate the performance of different ODQA systems under two settings: one using an unpolluted corpus (CLEAN) and the other using a misinformation-polluted corpus, which is manipulated using different misinformation generation methods (CTRLGEN, REVISE, REIT, GENREAD).
We present the performance of QA models in Table 2, in which we configured a fixed number of retrieved context passages for each reader6 .We identify four major findings.1.Our findings indicate that misinformation poses a significant threat to ODQA systems.When subjected to three types of deliberate misinformation pollution -namely, CTRLGEN, REVISE, and REIT-all ODQA systems demonstrated a huge decline in performance as fake passages infiltrated the corpus.The performance drop ranges from 14% to 54% for DPR-based models and ranges from 20% to 87% for BM25-based models.Even under the GENREAD scenario, where misinformation is inadvertently introduced through hallucinations, we noted a 5% and 15% decrease in ODQA performance for the best-performing model (DPR+FiD) on NQ-1500 and CovidNews, respectively.These reductions align with our expectations, given that ODQA systems, trained on pristine data, are predisposed towards retrieving seemingly relevant information, without the capacity to discern the veracity of that information.This reveals the vulnerability of current ODQA systems to misinformation pollution, a risk that emanates both from intentional attacks by malicious entities and unintentional hallucinations introduced by LLMs.
2. The strategy of reiterating misinformation (REIT) influences machine perception more effectively, even though such misinformation tends to be more easily discernible to human observers.We observed that the REIT pollution setting outstripped all others by significant margins.This striking impact corroborates our expectations as we essentially flood machine readers with copious amounts of seemingly vital evidence, thereby distracting them from authentic information.Considering that machine readers are primarily conditioned to extract answer segments from plausible sources -including generative readers -it is logical for such attack mechanisms to attain superior performance.The simplicity and easy implementation of this attack method underlines the security vulnerabilities inherent in contemporary ODQA systems.
3. To further understand how misinformation pollution affects ODQA systems, we present in Table 7 the proportions of the questions where at least one fabricated passage was among the top-K retrieved documents.We find that LLM-generated misinformation is quite likely to be retrieved by both the BM25 and the DPR retriever.This is primarily because the retrievers prioritize the retrieval of passages that are either lexically or semantically aligned with the question, but they lack the capability to discern the authenticity of the information.We further reveal that REVISE is superior to GENREAD in producing fake passages that are more likely to be retrieved, and sparse retrievers are particularly brittle to deliberate misinformation pollution, e.g., REIT.The detailed configurations and findings are in Appendix B.
4. Questions without dependable supporting evidence are more prone to manipulation.Comparing the performance differentials across the two test sets, we notice a more pronounced decline in system performance on the CovidNews test set.Our hypothesis for this phenomenon lies in the relative lack of informational depth within the news domain as opposed to encyclopedias.Subsequent experiments corroborate that the WMT News Corpus indeed provides fewer and less pertinent resources for answering queries.We delve into this aspect in greater detail in Appendix C.
Moreover, we discover that the generated texts in the GENREAD setting have a significantly more detrimental effect on the CovidNews benchmark compared to the NQ-1500 benchmark.This highlights the uneven capabilities of GPT-3.5 in retaining information across diverse topics.We postulate that this may be partially due to the training procedure being heavily reliant on Wikipedia data, which could potentially induce a bias towards Wikipediacentric knowledge in the model's output.

Defense Strategies
A fundamental approach to mitigate the negative impacts of misinformation pollution involves the development of a resilient, misinformation-aware QA system.Such a system would mirror human behavior in its dependence on trustworthy external sources to provide accurate responses.In our pursuit of this, we have explored three potential strategies.In the following sections, we will succinctly outline the reasoning behind each strategy, present our preliminary experimental results, and discuss their respective merits and drawbacks.Details on experimental configurations are in Appendix A.

Detection Approach
The initial strategy entails incorporating a misinformation detector within the QA system, equipped to discern model-generated content from human-authored ones.To test this approach, we have employed a RoBERTa-based classifier (Liu et al., 2019), fine-tuned specifically for this binary classification task.For acquiring the training and testing data, we leveraged the NQ-1500 DPR retrieval result, randomly partitioning the first 80% for training, and reserving the remaining 20% for testing.For each query, we used the top-10 context passages, amounting to 12,000 training instances and 3,000 testing instances.Training the above detector assumes the accessibility of the in-domain NQ-1500 data.Acknowledging the practical limitations of in-domain training data, we also incorporated an existing dataset of GPT3 completions based on Wikipedia topics to train an out-of-domain misinformation detector.
Vigilant Prompting LLMs have recently exhibited a remarkable ability to follow human instructions when provided with suitable prompting (Ouyang et al., 2022).We aim to investigate whether this capability can be extended to follow directives aimed at evading misinformation.Our experimental design utilizes GPT-3.5 as the reader, employing QA prompts that include an additional caution regarding misinformation.For example, the directive given to the reader might read: "Draw upon the passages below to answer the subsequent question concisely.Be aware that a minor portion of the passages may be designed to mislead you." Reader Ensemble Traditionally in ODQA, all retrieved context passages are concatenated before being passed to the reader.This approach may cause the model to become distracted by the presence of misinformation.In response to this, we propose a "divide-and-vote" technique.Firstly, we segregate the context passages into k groups based on their relevance to the question.Each group of passages is then used by a reader to generate an answer.Subsequently, we apply majority voting on the resulting k candidate responses a 1 , a 2 , ..., a k to calculate the voted answer(s) a v , using the formula . Through this voting strategy, we aim to minimize the impact of misinformation by limiting the influence of individual information sources on answer prediction.

Main Results
The performance of detectors trained on both in-domain and out-of-domain data is illustrated in Table 3, revealing significant variances.In-domain trained detectors consistently deliver high AUROC scores (91.4%-99.7%),whereas out-of-domain trained classifiers show only slight improvements over random guessing (50.7%-64.8%).Despite the impressive results obtained with in-domain detectors, expecting a sufficient quantity of in-domain training data to always be available is impractical in real-world scenarios.This is due to our lack of knowledge regarding the specific model malicious actors may use to generate misinformation.Additionally, our out-of-domain training data, despite being deliberately selected to match the genre, topic, and length of the detection task's targets, yielded disappointing results.This underscores the challenge of training a versatile, effective misinformation detector.
Incorporating additional information through prompting GPT readers yielded inconsistent out-comes, as indicated in Table 4.This variation may be attributable to the dearth of data and the absence of tasks similar to the ones during the GPT-3.5 training phase.The voting strategy yielded benefits, albeit with attached costs.Voting consistently achieved better effectiveness compared with the prompting strategy, as demonstrated in Table 4.It is essential to note, however, that the deployment of multiple readers in the Voting approach necessitates additional resources.Despite the potential for concurrent processing of multiple API calls, the cost per question escalates linearly with the number of context passages used, rendering the method financially challenging at a large scale.
Does Reading more Contexts help?Intuitively, a straightforward way to counteract the proliferation of misinformation in ODQA is to diminish its prevalence, or in other words, to decrease the ratio of misinformation that the QA systems are exposed to.A viable method of achieving this is by retrieving a larger number of passages to serve as contexts for the reader.This approach has demonstrated potential benefits in several ODQA systems that operate on a clean corpus (Izacard and Grave, 2021;Lewis et al., 2021).To explore its effectiveness against misinformation, we evaluate the QA performance using different amount of context passages given to readers.
Figure 3 shows the relation between context size used in readers and the QA performance.Instead of reporting the absolute EM score, we report the relative EM drop compared with the EM score under the clean corpus setting to measure the impact of misinformation pollution.Interestingly, our results show that increasing the context size has minimal, if not counterproductive, effects in mitigating the performance decline caused by misinformation.This aligns with the previous observation that ODQA readers rely on a few highly relevant contexts, regardless of the entire volume of contexts to make the prediction (Chen et al., 2022).A straightforward strategy of "diluting" the misinformation by increasing the context size is not an effective way to defend against misinformation pollution.Summary In our exploration of three strategies to safeguard ODQA systems against misinformation pollution, we uncover promising effects through the allocation of additional resources.These include using in-domain detection training and engaging multiple readers to predict answers via a voting mechanism.Nonetheless, the development of a cost-effective and resilient QA system capable of resisting misinformation still demands further research and exploration.

Discussion
In previous sections, we established a comprehensive threat model that encompasses misinformation generation, its resultant pollution, and potential defense strategies.While our research primarily resides in simulation-based scenarios, it has shed light on numerous potential risks posed by misinformation created by large language models.If left unaddressed, these risks could substantially undermine the current information ecosystem and have detrimental impacts on downstream applications.In this section, we offer a discussion on the practical implications of misinformation pollution in a real-world web environment.Our focus is on three crucial factors: information availability, the associated costs, and the integrity of web-scale corpora.
Information Availability Our misinformation generation methods only require minimal additional information, such as a relevant real passage (REVISE), or a modest amount of knowledge about the targeted system (REIT).Given the relative ease of producing misinformation with LLMs, we forecast that misinformation pollution is likely to become an imminent threat to the integrity of the web environment in the near future.It is, therefore, critical to pursue both technological countermeasures and regulatory frameworks to mitigate this threat.

Associated Costs
The cost of the misinformation attack depends on factors like language model training and maintenance, data storage, and computing resources.We focus on the dominant cost in our experiments: OpenAI's language models API fees.We estimate producing one fake document (200 words) costs $0.01 to $0.04 using text-davinci-003, significantly lower than hiring human writers .This cost-efficient misinformation production scheme likely represents the disinformation industry's future direction.

Integrity of Web-scale Corpora
The quality of web-scale corpora is important for downstream applications.However, web-scale corpora are known to contain inaccuracies, inconsistencies, and biases (Kumar et al., 2016;Greenstein and Zhu, 2012).Large-scale corpora are especially vulnerable to misinformation attacks.Decentralized cor-Figure 3: The relative EM change (percentage) under different misinformation poisoning settings with respect to a number of context passages.The result is averaged over four configurations, two retrievers mixed-and-matched with two test datasets.We limit the context size to 10 when using GPT due to its limit on input length.pora with data in URLs risk attackers hijacking expired domains and tampering with contents (Carlini et al., 2023).Centralized corpora, such as the Common Crawl suffer from unwanted data as well (Luccioni and Viviano, 2021); even for manually maintained ones like Wikipedia, it is still possible for misinformation to slip in (Carlini et al., 2023).

Conclusion and Future Work
We present an evaluation of the practicality of utilizing Language Model Models (LLMs) for the automated production of misinformation and we examine their potential impact on knowledge-intensive applications.By simulating scenarios where actors deliberately introduce false information into knowledge sources for question-answering systems, we discover that machines are highly susceptible to synthetic misinformation, leading to a significant decline in their performance.We further observe that machines' performance deteriorates even further when exposed to intricately crafted falsehoods.In response to these risks, we propose three partial solutions as an initial step toward mitigating the impact of LLM misuse and we encourage further research into this problem.
Our future research directions for extending this work could take three paths.Firstly, while we have thus far only illustrated the potential dangers of misinformation generated by LLMs in ODQA systems, this threat model could be employed to assess risk across a broader spectrum of applications.Secondly, the potential of LLMs to create more calculated forms of misinformation, such as hoaxes, rumors, or propagandistic falsehoods, warrants a separate line of inquiry.Lastly, there is an ongo-ing need for further research into the development of cost-effective and robust QA systems that can effectively resist misinformation.

Limitations
Despite the remarkable capabilities of GPT-3.5 (text-davinci-003) in generating high-quality textual content, one must not disregard its inherent limitations.Firstly, the reproducibility of its outputs presents a significant challenge.In order to mitigate this issue, we shall make available all prompts and generated documents, thereby facilitating the replication of our experiments.Secondly, the cost associated with GPT-3.5 is an order of magnitude greater than that of some of its contemporaries, such as ChatGPT (gpt-turbo-3.5),which inevitably constrained the scope of our investigations.The focus of this research lies predominantly on a selection of the most emblematic and pervasive QA systems and LLMs.Nonetheless, the findings derived from our analysis may not necessarily be applicable to other systems or text generators.For instance, QA systems employing alternative architectures, as demonstrated by recent works (Shao and Huang, 2022;Su et al., 2022), may exhibit increased robustness against the proliferation of misinformation.

Ethics Statement
We decide to publicly release our model-generated documents and the prompts used for creating them, despite the potential for misuse and generating harmful disinformation.We believe open sourcing is important and we justify our decision as follows.
Firstly, since our model relies on the readily available OpenAI API, replicating our production process is feasible without access to the code.Our objective is to raise awareness and encourage action by investigating the consequences of misusing large language models.We aim to inform the public, policymakers, and developers about the need for responsible and ethical implementation.
Secondly, our choice to release follows a similar approach taken with Grover (Zellers et al., 2019) 7 , a powerful detector and advanced generator of AIgenerated fake news.The authors of Grover argue that threat modeling, including a robust generator or simulation of the threat, is crucial to protect against potential dangers.In our research, we establish an effective threat model for ODQA in the context of misinformation.Future studies can build upon the transparency of our model, further enhancing our proposed defense techniques for AI safety.

A Configuration Details
Here we elaborate on the specific configurations used in our experiments for clarity.
Rationale behind the prompt template design.Table 5 summarizes the aforementioned four misinformation generation settings.We categorize their key features based on three dimensions: 1) maliciousness: if the setting intends to deliberately elicit misinformation, 2) resourcefulness: if the setting demands resources unavailable to regular users, such as a real factual article, and 3) customization: if the generated documents are designed to influence the target audience (in our case, machines).We believe these settings cover the common potential scenarios of misusing LLMs for misinformation generation.
Generator.We employed the default hyperparameter for  in all text-generation experiments, specified in the Ope-nAI API documentation8 .
Prompt template for plausible answer production.Since correct answers were not needed, we relied solely on the parametric memory of language models.The prompt we used is: Generate a false answer to the given question.It should be of short (less than five words in general) and look plausible, compared to the reference answer.Question: Question Reference Answers: Answer To ensure our plausible answers produced were indeed incorrect, we repeated the answer generation process five times and randomly sampled one answer that does not match any reference answer.
• DPR: We used the checkpoint provided by the original DPR repository9 , trained on five QA datasets including NQ.
• FiD: We used the FiD-large checkpoint provided by the original FiD repository10 trained on NQ.
Details of the defense strategies.For the detecting-based method, We employed the RobertaForSequenceClassification checkpoint provided by huggingface (Wolf et al., 2020).
For both in-domain and out-of-domain classifiers, we used 12,000 context passages for training.We sampled data from a dataset containing both actual Wikipedia snippets and Wikipedia completions generated by GPT-3(Bhat, 2023), which share many similar properties with text generated in our experiments.The model is configured for 3 epochs of training with a learning rate of 0.001.For the prompting-based method, we designed five different misinformation-aware prompts, and report the average EM score across these prompts.We drew inspiration from concurrent works (Hong et al., 2023;Li et al., 2023b) and engineering experience11 , then utilized ChatGPT to produce five prompts in accordance to one human written example.We report each prompt and its respective performance in table 6.For the voting method, we explored various configurations of the number of readers k and the number of context passages used for each reader n.We report the best-performing configuration where k = 5 readers and n = 10 passages for each reader based on preliminary experiments.

B Retrieval Performance
Retrieval pollution.Table 7 presents the percentages of questions in our two benchmarks that contain synthetic disinformation, known as "poisoned" questions.A question is considered poisoned if it includes at least one synthetic passage retrieved in the top-K passages.
Machine-generated misinformation could easily infiltrate information retrieval systems.Comparing between generation settings, REVISE outperforms GENREAD in producing information more likely to be retrieved, thanks to its 'gold template' that its misinformation is based on.However, we observed that REIT degraded QA performance the most, which highlights the security risks of a deliberate attack on automated systems.Sparse retrievers are particularly brittle to targeted misinformation.When targeting the sparse retriever BM25, REIT can poison more than 90% of questions using only 10 context passages, and over 95% of questions using 100 context passages, indicating

Figure 1 :
Figure 1: An overview of the proposed threat model, which illustrates the potential risks of corpora pollution from model-generated misinformation, including intended disinformation pollution from malicious actors with the assist of LLMs and unintended hallucination pollution introduced by LLMs.
Pharmaceutical manufacturer Johnson & Johnson announced Friday that early trials of a COVID-19 vaccine showed a 98 50 percent success rate in showing a boost in the immune system.… Dr. Anthony Fauci expressed his hopes that the vaccine would reach a minimum effectiveness of 50 percent, which the Johnson & Johnson vaccine has achieved.

Table 1 :
The size of the clean corpus and the number / percentage of fake passages injected into the clean corpus for each setting.We employ the 100-word split of a document as the unit to measure the size.

Table 2 :
Open-domain question answering performance under misinformation pollution on NQ-1500 and CovidNews.The texts in blue are model configurations, i.e., retriever, reader, and the number of context passages used (ctxs).Rel. is the relative change of EM score in percentage compared to the CLEAN setting.