Cross-lingual Evidence Improves Monolingual Fake News Detection

Misleading information spreads on the Internet at an incredible speed, which can lead to irreparable consequences in some cases. Therefore, it is becoming essential to develop fake news detection technologies. While substantial work has been done in this direction, one of the limitations of the current approaches is that these models are focused only on one language and do not use multilingual information. In this work, we propose a new technique based on cross-lingual evidence (CE) that can be used for fake news detection and improve existing approaches. The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed, firstly, by manual experiment based on a set of known true and fake news. Besides, we compared our fake news classification system based on the proposed feature with several strong baselines on two multi-domain datasets of general-topic news and one newly fake COVID-19 news dataset showing that combining cross-lingual evidence with strong baselines such as RoBERTa yields significant improvements in fake news detection.


Introduction
After the manipulation of opinions on Facebook during the 2016 U.S. election (Allcott and Gentzkow, 2017), the interest in the topic of fake news has increased substantially. Unfortunately, the distribution of fakes leads not only to misinformation of readers but also to more severe consequences such as shooting in Washington Pizzeria (Kang and Goldman, 2016) that was caused by the spreading of fake news about Hillary Clinton leading a child sex trafficking. Also, due to the global pandemic in 2020, there was a simultaneous emergence of infodemic (Alam et al., 2020) that could lead to an even worse epidemiological situation and harm people's health dramatically.
As a result, fake news received tremendous public attention, as well as drawn increasing interest from the academic community. Multiple supervised fake news detection models were proposed based on linguistic features (Pérez-Rosas et al., 2018;Patwa et al., 2020); deep learning models (Barrón-Cedeño et al., 2019;Glazkova et al., 2020;Kaliyar et al., 2021); or signals from social networks (Nguyen et al., 2020;Cui et al., 2019). One of the directions of the supervised approaches is to use additional information from the Web (Popat et al., 2017;Karadzhov et al., 2017;Ghanem et al., 2018). However, in these works only monolingual signals were taken into account.
In our work, we assume that viral spreading of (fake) information may naturally hit the "language barrier" and cross-checking of facts across media in various languages (supposed to be strongly independent) could yield an additional signal. We aim to close this gap and perform an exploration of cross-lingual Web features to fake news detection.
The contribution of our work is a new crosslingual evidence feature for fake news detection based on multilingual news verification. 1 We conduct a manual experiment based on cross-lingual dataset markup to evaluate if the user can use such a feature for misinformation identification. After that, we implement the proposed feature showing that adding cross-lingual evidence consistently improves the results of strong baselines including large pre-trained transformers. We release publicly all code and data. 2

Related Work
Firstly, several datasets have been collected for different sub-tasks of fake news detection pipeline: Several supervised models were previously explored. Some of the works focused on exploring internal features of news. In (Pérez-Rosas et al., 2018;Patwa et al., 2020) different linguistic features extracted from news texts were used. In (Ghanem et al., 2020) the perspective of the usage of emotional signals extracted from the news text for detecting fakes was shown. In addition to internal features, a set of external features can add more confidence in fake news detection model decision reasoning. For instance, user interaction signals were explored in (Nguyen et al., 2020;Cui et al., 2019). Another quite strong signal can be additional information extracted from the Web. In (Popat et al., 2017;Karadzhov et al., 2017;Ghanem et al., 2018;Li and Zhou, 2020) the authors referred to the Web search (Google or Bing) to collect relevant articles and use such scraped information as an external feature to build a fake news classifier.
Seeking information via some search engine to find evidence is a quite natural feature motivated by real users' behaviour. Several studies tried to figure out how users authenticate the information from the Web. Jr. et al. (2018) showed that individuals rely on both their judgment of the source and the message, and when this does not adequately provide a definitive answer, they turn to external resources to authenticate news. The intentional and institutional reaction was seeking confirmation from institutional sources (some respondents answered simply "Google"). Moreover, participants that received messages across different media plat-forms (Zhao, 2019) and different perspectives of the information (Geeng et al., 2020) showed greater awareness about news evidence. Consequently, the information from the external search is an important feature for news authenticity evaluation and evidence seeking. While the idea of multilingualism was already explored for hate speech (Aluru et al., 2020) and rumors (Wen et al., 2018) detection, however, previous works did not fully use multilingual information of fake news detection. In our study, we explore fake news spread on the Web for different languages and extend evidence retrieval to cross-lingual news verification.

Detection of Fake News using Cross-lingual Evidence (CE)
Our approach is based on the following hypothesis: if the news is true, then it will be widespread in different languages and also across media with different biases, and the facts mentioned should be identical. On the other hand, if it is fake news, it will receive a lesser response in the foreign press than true news. The step-by-step process, schematically represented in Figure 1, is as follows: Step 1. Text extraction: As a new article arrives, title and content are extracted from it.
Step 2. Text translation: The title is translated into target languages and new search requests are generated.
Step 3. Cross-lingual news retrieval: Search is executed based on the translated titles in multiple languages.
Step 4. Cross-lingual evidence impact computation Top-N articles from search results are used to evaluate the authenticity of the initial news. The information described in the news is compared with the information in the articles from the search result. The number of articles that confirms or disproves the original news is estimated.
Step 5. News classification: Based on the information from the previous step, the decision is made about the authenticity of the news. If the majority of results support the original news, then it is more likely to be true; if there are contradictions -it is a signal to consider the news as fake.
To confirm the hypothesis that cross-lingual evidence can be used for fake news detection we conducted two experiments. The first one (Section 4) is a manual small-scale study confirming the hypothesis that a person can distinguish fake news based on such cross-lingual evidence. The second one (Section 5) is an automated fake news detection system tested on several fake news datasets: we implemented our cross-lingual evidence feature and compared it with several baselines achieving SOTA on all datasets.

Experiment 1: Manual Verification
First, we conducted a manual experiment on a small dataset to test the hypothesis in "ideal conditions".

Dataset
For fake news examples, we used the list of top 50 fake news from 2018 according to BuzzFeed. 4 . For true news, we used NELA- GT-2018dataset (Norregaard et al., 2019. We manually selected 10 fake and true news and manually executed all steps of our approach (Section 3) on this dataset. This dataset featuring 20 news is provided in Table 2 in the Appendix A: the dataset is combined by news from several fields -celebrities, science, politics, culture, and world.

Experimental Setup
We precalculated Step 2 and Step 3 for annotators convenience and reproducibility. We generated cross-lingual requests in five languages -English, French, German, Spanish, and Russian. For translation from English, Google Translation service was used. As all news are of 2018, the time range of every search was limited only by this year. From search results, we used the first page of the search which consisted of 10 news. As a result, for 20 news for each of languages we got 1000 pairs of "original news ↔ scraped news" to markup.
We asked 6 annotators to take part in the experiment: manually conduct Step 4: cross-lingual evidence impact computation. For each news, we provide information about its title, content, and link of the source. Every annotator got 10 randomly selected news, as a result, we got each news crosschecked by 3 annotators. All non-English news were translated into English. For each pair "original news ↔ scraped news" the annotator provided one of three answers: 1) Support: the information in the scraped news supports the original news; 2) Refute: the information is opposite or differ from the original news or there is an explicit refutation; 3) Not enough info: the information is not relevant or not sufficient to support/refute the original news. Finally, at the end of the annotation of a sample, the annotator was asked to conduct Step 5 of the pipeline and classify the news as fake or true.
The used interface for manual markup is presented in Appendix A Figure 3.

Discussion of Results
Based on the collected annotations, for each news we chose the final label based on the majority voted. We estimated confidence in the annotators' agreement with Krippendorff's alpha (α = 0.83). After that, we calculated the distribution of each type of annotators' answers for the top 10 search results by language for fake and true news separately. The results are provided in Figure 2.
As we can see, the distribution of labels for true news significantly differs from the distribution for fake ones: the number of supporting articles is The proposed feature is used in two parts: (i) content similarity score based on embeddings distance (Sim); (ii) AlexaRank score of the scraped news source (AlexaRank). ME stands for Monolingual Evidence. The statistical significance of the baselines improvements was tested with paired t-test over 5-fold cross-validation. enough for almost every language. At the same time, for fake news we got more refuting signals than supporting for the English language and little or no evidence or relevant information dissemination for other languages. The average accuracy of annotators classification is 0.95. Thus, a person can distinguish fake based on cross-lingual evidence.

Experiment 2: Automatic Verification
We implemented cross-lingual evidence (CE) feature, as described below. We tested its performance on fake news detection on three multi-domain datasets comparing it with strong baselines.

Cross-lingual Evidence (CE) Feature
Cross-lingual evidence retrieval As in manual setup, for translation and search we used Google services via Python APIs. In our setup for the automated feature we focused as well on five languages: English, French, German, Spanish, and Russian. We extracted only the first page of the search result that gave us 10 articles for each language.
Cross-lingual text similarity For unsupervised cross-lingual relevance computation between original news and scraped one, we chose cosine similarity between sentence embeddings. To get sentence vector representation, we averaged the both title and content sentence's tokens' embeddings extracted from M-BERT (Devlin et al., 2019). For the sample news the similarity score is extracted for all 10 pairs "original news ↔ scraped news" for each of 5 languages.
Source credibility Also, we took into account the credibility of the source. Following Popat et al.
Cross-lingual evidence (CE) feature is constructed of two parts: content similarity score based on embeddings distance (Sim) and AlexaRank score of the scraped news source (AlexaRank).

Datasets
Firstly, we evaluate the systems on a multi-domain dataset by Pérez-Rosas et al. (2018)

Baselines
We compare to both linguistic-based fake news detection models and SOTA deep neural networks: Linguistic Features: In (Pérez-Rosas et al., 2018) a baseline fake news classification model was trained based on Ngrams, punctuation, psycholinguistic features extracted with LIWC, readability, syntax, and concatenation of all these set of features. In (Zhou et al., 2020) LIWC features were also used as one of the proposed baselines. We tested these features separately, grouped them all, and in combination with our proposed feature. We experimented with SVM, RandomForest, LogRegression, and LightGBM. The best models based on LightGBM are presented.
Text-CNN, LSTM: Following (Zhou et al., 2020), we tested TextCNN and LSTM models on all datasets. We fined-tuned models hyperparameters and report the best ones in the results.
BERT, RoBERTa: BERT (Devlin et al., 2019) based models were used for fake news detection by Kaliyar et al. (2021) and specifically for COVID-19 fake news classification (Gundapu and Mamidi, 2021;Glazkova et al., 2020). We used pretrained models and fine-tuned them. The combination with CE feature was done as a concatenation with [CLS] token embedding before Linear layer.
Monolingual Evidence (ME): In addition, we compared our feature with the case when only monolingual English evidence was used. The LightGBM classification model was used as well. Table 1 compares results of our model based on cross-lingual evidence (CE) with the baselines on three datasets. The statistical significance of the baselines improvements was tested with paired t-test over 5-fold cross-validation. The CE feature by itself outperforms all baseline for Fake-NewsAMT and better than some linguistic features for Celebrity and ReCOVery. The monolingual English evidence (ME) works worse than the cross-lingual one. The usage of only rank feature improves the baselines, but the best scores are achieved by adding full CE features set. The combinations of CE feature with BERT and RoBERTa gains SOTA results for all dataset. At the same time, despite linguistic features did not outperform Transformer-based baselines, the combination of our CE feature and different linguistic features showed competitive results that can be more explainable than the transformer model. Examples how retrieved cross-lingual results can be used to explain the classification results are illustrated in Appendix B.

Conclusion
We presented an approach for fake news detection based on cross-lingual evidence (CE) which provides a different perspective on the event across languages verified in two experiments. A fake news classification model with CE significantly improves performance over various baselines and compares favorably to SOTA. Besides, the CE is interpretable as a user can check in which and how many languages a piece of given news was found.
A promising direction to explore is to increase the number of languages used for cross-lingual information retrieval. In addition to this, the general distribution of news in the world should be taken into account -for instance, US news tend to be covered in European presses more than European news are covered in the US press. Also, in our work the language of original news was English. The analogous experiments for other original languages of news should be conducted.  Dozens of trucks filled with manure showed up in front of the house around 6:00 this morning and began dumping their smelly cargo over the property's lawn.

References
George Fitzgerald, Mr. Morris' former employer, was awakened by the sound of the vehicles on his property and rapidly called the police.
Unfortunately, it took the police more than 15 minutes to arrive on the site, and more than 10,000 tons of manure had already been dumped in the meantime.
Brian Morris was standing right across the street and laughing when the police arrived, and he rapidly came over to confess his responsibility and explain his motivations.
Lieutenant Frank Meyers, a spokesman of the Clarendon Hills Police Department, met the press a few hours later to explain the motivations behind this strange crime.
-- Dozens of trucks filled with manure showed up in front of the house around 6:00 this morning and began dumping their smelly cargo over the property's lawn.
George Fitzgerald, Mr. Morris' former employer, was awakened by the sound of the vehicles on his property and rapidly called the police.
Unfortunately, it took the police more than 15 minutes to arrive on the site, and more than 10,000 tons of manure had already been dumped in the meantime.
Brian Morris was standing right across the street and laughing when the police arrived, and he rapidly came over to confess his responsibility and explain his motivations.
Lieutenant Frank Meyers, a spokesman of the Clarendon Hills Police Department, met the press a few hours later to explain the motivations behind this strange crime.
-- "World News Daily Report assumes all responsibility for the satirical nature of its articles and for the fictional nature of their content. All characters appearing in the articles in this website -even those based on real people -are entirely fictional and any resemblance between them and any person, living, dead or undead, is purely a miracle." PolitiFact has found World News Daily Report's stories false before. However, the hoax did make a real impact. According to the Chicago Tribune, the post had been circulated throughout the Clarendon Hills community, and the police department received some calls inquiring about it in May. "I guess it is humor to a certain extent, but people need to read to the bottom and find out it's not a real news story," Village President Len Austin told the Chicago Tribune at the time. "The problem these days is that people see a headline online and jump to conclusions." A viral World News Daily Report post claims that a lottery winner was arrested for dumping $200,000 of manure on his former boss' lawn. While pulling photos from real news stories, the site itself admits that the article is satire.

Englsih query
-Finish!!! Your decision: Finally, how can you classifier the news: is it fake or true? Figure 3 -User interface that was used for annotators answer collection for manual verification. An annotator has to conduct Step 4 and Step 5 of the pipeline: (i) identify whether a cross-lingual scraped news supports, refutes or has not enough info with respect to the original one; (ii) classify the original news as a fake or a true one based on the provided cross-lingual evidence. Amazon makes its first delivery by drone in the United States Amazon recibe autorización para operar entregas con drones Amazon receives authorization to operate drone deliveries Amazon recibe aprobación federal para arrancar Prime Air, su propuesta de entrega con drones Amazon receives federal approval to launch Prime Air, its drone delivery proposal Russian search results Amazon запускает дроны Prime Air для быстрой доставки Amazon launches Prime Air drones for fast delivery В США прошла первая публичная демонстрация доставки товара с помощью дронов Amazon Prime Air First Public Demonstration of Amazon Prime Air Product Delivery Held in USA Amazon показала новые гибридные дроны для доставки заказов сервиса Prime Air Amazon Shows New Hybrid Drones To Deliver Prime Air Orders Table 3 -The example of the cross-lingual evidence extraction for fake and legit news from FakeNewsAMT. For each target language (English, French, German, Spanish, Russian) search results are presented: titles of top 3 news. For every non-Enlgish title the English translation is provided. For fake news the search results across other languages are only mildly topically related to the original news while for legit news the search results across other languages are strongly related to the original news.