Graph-based Fake News Detection using a Summarization Technique

Nowadays, fake news is spreading in various ways, and this fake information is causing a lot of social damages. Thus the need to detect fake information is increasing to prevent the damages caused by fake news. In this paper, we propose a novel graph-based fake news detection method using a summarization technique that uses only the document internal information. Our proposed method represents the relationship between all sentences using a graph and the reflection rate of contextual information among sentences is computed by using an attention mechanism. In addition, we improve the performance of fake news detection by utilizing summary information as an important subject of the document.The experimental results demonstrate that our method achieves high accuracy, 91.04%, that is 8.85%p better than the previous method.


Introduction
Recently, people are easily exposed to large amount of information in various ways with the development of information propagation methods. But some of the large amount of information contains fake information generated for malicious purposes. This fake information is confusing people and causing a lot of social and economic damages. Therefore, the need to detect fake information is increasing to prevent the damages caused by fake news, and it is being researched industrially and academically (Yang et al., 2012;Castillo et al., 2011;Yan et al., 2015).
Prior fake news detection studies have methods of detecting fake news using external information as well as internal information. The method of using internal information detects fake news by analyzing linguistic features in news, such as news content (Levi et al., 2019), writing styles and con-sistency (Potthast et al., 2018), and relational structure between sentences (Karimi and Tang, 2019) within the news. On the other hand, that of using external information does by analyzing metadata such as aspects of news spreading (Monti et al., 2019) and user profiles of the people spreading the news (Lu and Li, 2020). However, a collection task for external information requires a lot of time and cost, and it is very difficult to identify external information in all documents. In addition, it is more basic and important to understand a document with internal information, and the construction of structural relationships among sentences within a document is one of effective methods for detecting fake news. Therefore, we propose an effective fake news detection method by structuring a context graph for representing the relationships among sentences based on summarization information.
Since all sentences in a document are strongly related to each other, the contextual information with other sentences should be reflected to generate sentence embeddings. In addition, the attention mechanism is exploited for constructing a context graph so that the different relation strength between sentences influences the contextual information of each sentence in the graph. In the context graph, nodes consist of initial sentences embeddings for all the sentences in a document and the weight of an edge is estimated by an attention score between two ended nodes of the edge. We assume that all the nodes are connected because all the sentence in a document is strong related each other. Then a contextualized sentence embedding on each node is generated by reflecting the neighbor's contextual information. The contextualized sentence embedding of a node is computed by the sum of the products of the attention score between the node and its neighbor node and the initial sentence embedding of the neighbor node.
Because the subject of a document is very im- portant information in identifying the content of the document and fake information is commonly related to the subject, we apply a summarization technique (Jeong et al., 2016) that can effectively capture the subject information of the document to fake news detection. By using a summarization technique, sentences containing a lot of subject information are highly ranked and the ranked scores influence the attention scores for construction of the context graph. For performance comparison with the proposed method, we implemented a baseline model that detects fake news with the sum of all sentence embeddings in a document. Our proposed model shows 91.04% accuracy, that is 11.19%p better performance than baseline model. It also shows 8.85%p better performance compared to other model (Karimi and Tang, 2019) that uses the same dataset and a dependency tree structure among sentences in a document.

Notations
We have a corpus D of fake and real news documents. Let a document d ∈ D contain N sentences s 1 , s 2 , ...s N and each sentence s i ∈ d include words word 1 , word 2 , ...word l where l denotes the number of words in sentence s i . We apply the Bi-LSTM network to all sentences of a document and obtain the initial sentence embedding

Proposed Method
We propose a novel fake news detection method based on the context graph with a summarization technique (see Figure 1). Our method consists of three components. The first one is graph construction using attention mechanism for representing the relationship between all sentences using a graph. The second one is core sentence extraction for ranking sentences with subject information using a sum-marization technique. The third one is fake news detection for discriminating fake news from a lot of documents.

Graph Construction using Attention Mechanism
To model the relationship between sentences and their contextual information, we construct a graph G = (F, E). The graph G is composed of node (i.e., F = f 1 , f 2 , . . . f N ) and the edge (i.e., E = e 1,2 , e ij , . . . e N −1,N ). Each node is represented by sentence embedding H and the edge between the i-th node and the j-th node is represented by e i,j and its weight w i,j means the relation strength of the i-th sentence for the j-th sentence. Since all sentences in a document are strongly related to each other, we consider that G is a fully connected graph. To reflect the different relation strength between sentences, each edge e i,j is associated with a weight w i,j , and the weight is derived by the attention mechanism between sentence embeddings h i and h j of two nodes f i and f j as follows (Eq. 1-3): where W ∈ R m×dim and x ∈ R m . The dimension size of sentence embedding can be reduced to avoid overfitting without weakening the LSTM's capacity (Dozat and Manning, 2018) by Equations (1) and (2). U is a weight matrix to compute the relation strength between x i and x j . w i,j represent an edge weight between pair of nodes f i and f j in the graph G.

Core Sentence Extraction
Based on the constructed graph, the subject information of a sentence is estimated by summing attention scores (edge scores) between the node of the sentence and its adjacent nodes, and a sentence with the most subject information is extracted as a core sentence. Core sent represents an extracted sentence.
Based on the cosine similarity values between the core sentence and all other sentences, all sentences in the document are divided into a subject relevant sentence set and an irrelevant sentence set. The subject relevance score of each word is based on the frequency of appearance in the subject relevant and irrelevant sentence sets. The word relevance score RS d is calculated by Equation (5) (Jeong et al., 2016).
where p d and q d are the probabilities that a word appears in subject relevant and irrelevant sentences set in a document d. Respectively, R d and S d are the number of subject relevant sentences and irrelevant sentences in document d. r d and s d are the number of subject relevant and irrelevant sentences that include the word in a document d, and 0.5 is a naïve smoothing factor to avoid zero-denominator or log-zero. On this paper, the top 30% of sentences with the subject relevance score were selected as the subject relevant sentence set, and the others were as the irrelevant sentence set. Afterward, the sentence score can be calculated with the sum of relevance score of words included in the sentence (Jeong et al., 2016).

Fake News Detection
After we calculate the sentence ranking using the sentence scores by Equation (6), we can construct a new updated graph by reflecting the sentence ranking into the weight of the edge. The weight of the edge represents the reflection rate of the subject information as well as the context information.
(f or i, j = 1, 2, ..., N and i = j) The sentence ranking score is calculated by Equation (7) and the weight of all edges is updated by Equation (8). Subsequently, we can construct subject contextualized sentence embeddings using the updated graph. The subject contextualized sentence embedding (i.e., h i ) is created by the weighted sum of the initial sentence embedding of the current sentence (i.e., h i ) and those of the other adjacent sentences (i.e., h j ) as follows: Finally, we create a document embedding by averaging the embedding vectors of all sentences in a document and then the document embedding is then fed into a multi-layer feedforward neural network to predict label, fake or real, by the binary classification, as a vectorŷ.
The cross-entropy loss function is used to optimize our neural network.

Datasets
We used HDSF dataset (Karimi and Tang, 2019) for our fake news detection experiments. The dataset consists of 3,360 real documents and 3,360 fake documents. We follow the Karimi's data split: 6,452 documents for training data, 134 ones for validation set, and 134 ones for test set and each data split contains even number of documents from fake and real classes. In addition, we did experiments on 5-folds cross-validation because the original HDSF dataset contain too small size of test documents.

Experimental Settings
We used word2vec embedding (Mikolov et al., 2013) that is pre-trained from Google as an initial word embedding, and set the Bi-LSTM hidden unit size to 200. We set the epoch as 200, mini-batch size as 40, and dropout as 30% on each experiment. We used the Adam optimizer (Kingma and Ba, 2014) and set the initial learning rate as 0.001 and reduce by 10 times for every 50 epochs. We used accuracy as the metric of performance.

Models
The proposed model is compared to the baseline model and Karimi's model (Karimi and Tang, 2019) to prove a superiority of our proposed model in this subsection.
Karimi's Model (Karimi and Tang, 2019) A fake detection method to predicts whether a document is fake or real by constructing relationships of each sentence in a document using the hierarchical discourse-level dependency tree. Baseline Model The baseline model uses the document embedding only using Bi-LSTM and predicts whether it is fake or real through the feedforward neural network.
Graph Model This graph model connects all sentences in a document by constructing a graph structure. Fake news is detected by creating sentence embeddings that only reflect contextual information on a graph.

Graph + Summarization Model (Proposed)
The graph + summarization model is our final proposed model. It classifies the fake or real documents after representing a document embedding that reflect the contextual and subject information on a graph structure by the summarization technique. Table 1 shows a comparison of the experiment results. The RST, LIWC, N-grams, and BiGRNN-CNN models implemented by (Karimi and Tang, 2019). Our proposed model showed best performance compared to other models and it achieved better performance than the baseline model about 11.19%p, and even better than the Karimi's model about 8.85%p. In addition, we obtained 3.73%p improvement when summarization technique is used in the graph model. Moreover, the final proposed model showed the best performance even in the 5-folds cross-validation experiments with 9.06%p improvement as well.

Analysis
Herein, we introduce two analyses by an additional experiment for subject consistency detection in our model and dataset. The proposed method attempted to effectively detect subject information in fake news and the consistency of subject information is also important to detect fake news. In the first analysis, we verify our process of updating the graph in the proposed model, which uses a fully connected graph, by comparing other model using a 50% connected graph, which uses edges with only the top 50% with high attention scores. As a result, the proposed model with a fully connect graph showed 1.29%p higher performance than the model with a 50% connected graph. We think it means that a fully connect graph is more useful to detect fake news. Secondly, we observed the variances of the attention scores in the fake and real documents (see Table 2). The variance in the fake documents is higher than that of the real documents and it means that the fake documents has an inconsistent subject distribution.

Conclusions
We have proposed a novel graph based fake news detection method using summarization technique. Our model shows that the use of contextual and subject information is helpful in detecting fake news. Our final proposed model achieved better performance than the baseline model about 11.19%p.