Detecting Spoilers in Movie Reviews with External Movie Knowledge and User Networks

Online movie review platforms are providing crowdsourced feedback for the film industry and the general public, while spoiler reviews greatly compromise user experience. Although preliminary research efforts were made to automatically identify spoilers, they merely focus on the review content itself, while robust spoiler detection requires putting the review into the context of facts and knowledge regarding movies, user behavior on film review platforms, and more. In light of these challenges, we first curate a large-scale network-based spoiler detection dataset LCS and a comprehensive and up-to-date movie knowledge base UKM. We then propose MVSD, a novel Multi-View Spoiler Detection framework that takes into account the external knowledge about movies and user activities on movie review platforms. Specifically, MVSD constructs three interconnecting heterogeneous information networks to model diverse data sources and their multi-view attributes, while we design and employ a novel heterogeneous graph neural network architecture for spoiler detection as node-level classification. Extensive experiments demonstrate that MVSD advances the state-of-the-art on two spoiler detection datasets, while the introduction of external knowledge and user interactions help ground robust spoiler detection. Our data and code are available at https://github.com/Arthur-Heng/Spoiler-Detection


Introduction
Movie review websites such as IMDB1 and Rotten Tomato2 have become popular avenues for movie commentary, discussion, and recommendation (Cao et al., 2019).Among user-generated movie reviews, some of them contain spoilers, which reveal major plot twists and thus negatively  Freeman, which are the names of the actors.Guided by external movie knowledge, the names can be recognized as the roles in the movie.Moreover, by incorporating user networks, it is discovered that User 1 likes to post spoilers on some specific genres of movies such as drama and comedy.Thus the review is more likely to be a spoiler.affect people's enjoyment (Loewenstein, 1994).As a result, automatic spoiler detection has become an important task to safeguard users from unwanted exposure to potential spoilers.
Existing spoiler detection models mostly focus on the textual content of the movie review.Chang et al. (2018) propose the first automatic spoiler detection approach by jointly encoding the review text and the movie genre.Wan et al. (2019) extend the hierarchical attention network with item (i.e., the subject to the review) information and introduce user bias and item bias.Chang et al. (2021) propose a relation-aware attention mechanism to incorporate the dependency relations between context words in movie reviews.Combined with several open-source datasets (Boyd-Graber et al., 2013;Wan et al., 2019), these works have made important progress toward curbing the negative impact of movie spoilers.
However, robust spoiler detection requires more than just the textual content of movie reviews, and we argue that two additional information sources are among the most helpful for reliable and well-grounded spoiler detection.Firstly, external knowledge of films and movies (e.g.director, cast members, genre, plot summary, etc.) are essential in putting the review into the movie context.Without knowing what the movie is all about, it is hard, if not impossible, to accurately assess whether the reviews give away major plot points or surprises and thus contain spoilers.Secondly, user activities of online movie review platforms help incorporate the user-and movie-based spoiler biases.For example, certain users might be more inclined to share spoilers and different movie genres are disproportionally suffering from spoiler reviews while existing approaches simply assume the uniformity of spoiler distribution.As a result, robust spoiler detection should be guided by external film knowledge and user interactions on movie review platforms, putting the review content into context and promoting reliable predictions.We demonstrate how these two information sources can help spoiler detection in Figure 1.
In light of these challenges, this work greatly advances spoiler detection research through both resource curation and method innovation.We first propose a large-scale spoiler detection dataset LCS and an extensive movie knowledge base (KB) UKM.LCS is 114 times larger than existing datasets (Boyd-Graber et al., 2013) and is the first to provide user interactions on movie review platforms, while UKM presents an up-to-date movie KB with entries of modern movies compared to existing resources (Misra, 2019).In addition to resource contributions, we propose MVSD, a graph-based spoiler detection framework that incorporates external knowledge and user interaction networks.Specifically, MVSD constructs heterogeneous information networks (HINs) to jointly model diverse information sources and their multiview features while proposing a novel heterogeneous graph neural network (GNN) architecture for robust spoiler detection.
We compare MVSD against three types of baseline methods on two spoiler detection datasets.Extensive experiments demonstrate that MVSD significantly outperforms all baseline models by at least 2.01 and 3.22 in F1-score on the Kaggle (Misra, 2019) and LCS dataset (ours).Further analyses demonstrate that MVSD empowers external movie KBs and user networks on movie review platforms to produce accurate, reliable, and wellgrounded spoiler predictions.

Resource Curation
We first curate a large-scale spoiler detection dataset LCS based on IMDB, providing rich information such as review text, movie metadata, user activities, and more.Motivated by the success of external knowledge in related tasks (Hu et al., 2021;Yao et al., 2021;Li and Xiong, 2022), we construct a comprehensive movie knowledge base UKM with important movie information and up-to-date entries.

The LCS Dataset
We first collect the user id of 259,705 users from a user list presented in the Kaggle dataset (Misra, 2019).We then retrieve the most recent 300 movie reviews by each user and collect the information of users, movies, and cast members based on the IMDB website.Since IMDB allows users to selfreport whether its review contains spoilers, we adopt these labels provided by IMDB as annotations.We provide the comparison of our dataset to the Kaggle dataset in Table 1.As illustrated in Table 1, the LCS dataset has a much larger scale, more up-to-date information, and more comprehensive data. 3

The UKM Knowledge Base
Based on the LCS dataset, we then curate UKM, a comprehensive knowledge base of movie knowledge.We first assign each movie in the LCS dataset as an entity in the KB.We then collect all cast members and directors of these movies, de-duplicating them, representing each individual as an entity, and connecting movie entities with cast members based     ...   ...            on their roles in the movie.After that, we further represent years, genres, and ratings as entities, connecting them to movie and cast member entities according to the information in the dataset.We compare UKM against two existing movie knowledge bases (RippleNet (Wang et al., 2018) and MoviesLen-1m (Cao et al., 2019)) and present the results in Table 2, which demonstrates that UKM presents the largest and most up-to-date collection of movie and film knowledge to the best of our knowledge.UKM has great potential for numerous related tasks such as spoiler detection, movie recommender systems, and more.

Methodology
We propose MVSD, a Multi-View Spoiler Detection framework.The overall architecture of the model is illustrated in Figure 2. To leverage external movie knowledge and user activities that are essential in robust spoiler detection, MVSD constructs heterogeneous information networks to jointly represent diverse information sources.Specifically, we build three subgraphs: movie-review subgraph, user-review subgraph, and knowledge subgraph, each modeling one aspect of the spoiler detection process.MVSD first separately encodes the multi-view features of these subgraphs through heterogeneous GNNs, then fuses the learned representations of the three subgraphs through subgraph interaction.MVSD conducts spoiler detection with a node classification setting based on the learned representations of review nodes.

Heterogeneous Graph Construction
Graphs and graph neural networks have become increasingly involved in NLP tasks such as misinformation detection (Hu et al., 2021) and question answering (Yu et al., 2022).In this paper, we construct heterogeneous graphs to jointly model textual content, metadata, and external knowledge in spoiler detection.Specifically, we first construct the three subgraphs modeling different information sources: movie-review subgraph We mainly explain the compositions of the graph in the following and elaborate on the details about all the nodes and relations in Appendix C.

Movie-Review Subgraph
The movie-review subgraph models the bipartite relation between movies and user reviews.We first define the nodes denoted as V M , which include movie nodes, rating nodes, and review nodes.

User-Review Subgraph
The user-review subgraph is responsible for modeling the heterogeneity of user behavior on movie review platforms.The nodes in this subgraph, denoted as V U , include review nodes, user nodes, and year nodes.

Knowledge Subgraph
The knowledge subgraph is responsible for incorporating movie knowledge in external KBs.Nodes in this subgraph, denoted as V K , include movie nodes, genre nodes, cast nodes, year nodes, and rating nodes.
Note that the most vital nodes, movie nodes and review nodes, both appear in two subgraphs.These shared nodes then serve as bridges for information exchange across subgraphs, which is enabled by the MVSD model architecture in Section 3.3.

Multi-View Feature Extraction
The entities in the heterogeneous information graph have diverse data sources and multi-view attributes.In order to model the rich information of these entities, we propose a taxonomy of the views, dividing them into three categories.

Semantic View
The semantic view reflects the semantics contained in the text.We pass movie review documents, movie plot descriptions, user bio, and cast bio to pre-trained RoBERTa, averaging all tokens, and produce node embeddings v s as the semantic view.

Meta View
The meta view is the numerical and categorical feature.We utilize metadata of user accounts, movie reviews, movies, and cast, and calculate the z-score as node embeddings v m to get the meta view.Details about metadata can be found in Appendix D.2.

Knowledge View
The knowledge view captures the external knowledge of movies.Following previous works (Hu et al., 2021;Zhang et al., 2022), we use TransE (Bordes et al., 2013) to train KG embeddings for the UKM knowledge base and use these embeddings as node features v k for the external knowledge view.
Based on these definitions, each subgraph has two feature views, thus nodes in each subgraph have two sets of feature vectors.Specifically, the knowledge subgraph G K has the external knowledge view and the semantic view, the movie-review subgraph G M and the user-review subgraph G U has the meta view and the semantic view.We then employ one MLP layer for each feature view to encode the extracted features and obtain the initial node features x s i , x m i , x k i for the semantic, meta, and knowledge view.

MVSD Layer
After obtaining the three subgraphs and their initial node features under the textual, meta, and knowledge views, we employ MVSD layers to conduct representation learning and spoiler detection.Specifically, an MVSD layer first separately encodes the three subgraphs, then adopts hierarchical attention to enable feature interaction and the information exchange across various subgraphs.
Subgraph Modeling We first model each subgraph independently, fusing the two view features for each node.We then fuse node embeddings from different subgraphs to facilitate interaction between the three subgraphs.For simplicity, we adopt relational graph convolutional networks (R-GCN) (Schlichtkrull et al., 2018) to encode each subgraph.For the l-th layer of R-GCN, the message passing is as follows: where Θ self is the projection matrix for the node itself while Θ r is the projection matrix for the neighbor of relation r.By applying R-GCN, nodes in subgraph G K get features from the knowledge and semantic view, denoting as x K k and x K s , respectively.Nodes in subgraph G M get features from the semantic and meta view, denoting as x M s , x M m , while nodes in subgraph G U get the same views of feature, denoting as x U s , x U m .
Aggregation and Interaction Given the representation of nodes from different feature views, we adopt hierarchical attention layers to aggregate and mix the representations learned from different subgraphs.Our hierarchical attention contains two parts: view-level attention and subgraph-level attention.Considering movie node and review node are shared nodes of subgraphs and are of the most significance, we utilize these two kinds of nodes to implement our hierarchical attention.We first conduct view-level attention to aggregate the multi-view information for each type of node.For each node in a specific subgraph, it has embeddings learned from two types of feature views.We first adopt our proposed view-level attention to fuse the information learned from different views for each node.We learn a weight for each view of features in a specific subgraph.Specifically, the learned weight for each view in a specific subgraph G, (α G v 1 , α G v 2 ) can be formulated as where attn v denotes the layer that implements the view-level attention, and X G v i is the node embeddings from view v i in subgraph G.To learn the importance of each view, we first transform viewspecific embedding through a fully connected layer, then we calculate the similarity between transformed embedding and a view-level attention vector q G .We then take the average importance of all the view-specific node embedding as the importance of each view.The importance of each view, denoted as w v i , can be formulated as: where q G is the view-level attention vector for each view of feature, V G is the nodes of subgraph G, and x G v i j is the embedding of node j in subgraph G from view v i .Then the weight of each view in subgraph G can be calculated by .
It reflects the importance of each view in our spoiler detection task.Then the fused embeddings of different views can be shown as: Thus we get the subgraph-specific node embedding, denoted as X K , X M , X U .We then conduct subgraph-level attention to facilitate the flow of information between the three information sources.Generally, nodes in different subgraphs only contain information from one subgraph.To learn a more comprehensive representation and facilitate the flow of information between subgraphs, we enable the information exchange across various subgraphs using the movie nodes and the review nodes, both appearing in two subgraphs, as the information exchange ports.Specifically, we propose a novel subgraph-level attention to automatically learn the weight of each subgraph and fuse the information learned for different subgraphs.To be specific, the learned weight of each subgraph (β K , β M , β U ) can be computed as: where attn g denotes the subgraph-level attention layer.To learn the importance of each subgraph, we transform subgraph-specific embedding through a feedforward layer and then calculate the similarity between transformed embedding and a subgraphlevel attention vector q.Furthermore, we take the average importance of all the subgraph-specific node embedding as the importance of each subgraph.Taking G K and G M as an example, the shared nodes of these two subgraphs are movie nodes.The importance of each subgraph, denoted as w K , w M , can be formulated as: where V ∈ {K, M }, q is the subgraph-level attention vector for each subgraph.Then the weight of each subgraph can be shown as: After obtaining the weight, the subgraph-specific embedding can be fused, formulated as: Similarly, for review nodes, we can get the fused representation X rv .Our proposed subgraph-level attention enables the information to flow across different views and subgraphs.

Overall Interaction
One layer of our proposed MVSD layer, however, cannot enable the information interaction between all information sources (e.g. the user-review subgraph and the knowledge subgraph).In order to further facilitate the interaction of the information provided by each view in each subgraph, we employ MVSD layers for node representation learning.The representation of movie nodes and review nodes is updated after each layer, incorporating information provided by different views and neighboring subgraphs.This process can be formulated as follows: where Table 3: Accuracy, AUC, and binary F1-score of MVSD and three types of baseline methods on two spoiler detection datasets.We run all experiments five times to ensure a consistent evaluation and report the average performance as well as standard deviation.MVSD consistently outperforms the three types of methods on both benchmarks.* denotes that the results are significantly better than the second-best under the student t-test.

Learning and Optimization
After a total of MVSD layers, we obtain the final movie review node representation denoted as h ( ) .Given a document label a ∈ {SPOILER, NOT SPOILER}, the predicted probabilities arer calculated as p(a|d) ∝ exp MLP a (h ( ) ) .We then optimize MVSD with the cross entropy loss function.At inference time, the predicted label is argmax a p(a|d).

Experiment Settings
Datasets.We evaluate MVSD and baselines on two spoiler detection datasets: • LCS is our proposed large-scale automatic spoiler detection dataset.We randomly create a 7:2:1 split for training, validation, and test sets.
• Kaggle is a publicly available movie review dataset presented in a Kaggle challenge (Misra, 2019).We present more details about this dataset in Appendix D.

Overall Performance
Table 3 presents the performance of MVSD baseline methods on the two datasets.Bold and underline indicate the best and second best performance.Table 3 demonstrates that: • MVSD achieves state-of-the-art on both datasets, outperforming all baselines by at least 2.01 in F1score.This demonstrates that our various technical contributions, such as incorporating external knowledge and user networks, multi-view feature extraction, and the cross-context information exchange mechanism, resulted in a more accurate and robust spoiler detection system.
• Graph-based models are generally more effective than other types of baselines.This suggests that in addition to the textual content of reviews, graph-based modeling could bring in additional information sources, such as external knowledge and user interactions, to enable better grounding for spoiler detection.
• Among the two task-specific baselines, Spoil-erNet (Wan et al., 2019) outperforms DNSD (Chang et al., 2018), in part attributable to the introduction of the user bias.Our method further incorporates external knowledge and user networks while achieving better performance, suggesting that robust spoiler detection requires models and systems to go beyond the mere textual content of movie reviews.

External Knowledge and User Networks
We hypothesize that external movie knowledge and user interactions on movie review websites are essential in spoiler detection, providing more context and grounding in addition to the textual content of movie reviews.To further examine their contributions in MVSD, we randomly remove 20%, 40%, 60%, 80%, or 100% edges of the knowledge subgraph and user-review subgraph, creating settings with reduced knowledge and user information.We evaluate MVSD with these ablated graphs on the Kaggle dataset and present the results in Figure 3 (a).It is illustrated that the performance drops significantly (about 10% in F1-score when removing 60% of the edges) when we increase the number of removed edges in the user-review subgraph, suggesting that the user interaction network plays an important role in the spoiler detection task.As for the knowledge subgraph, the F1-score drops by 3.38% if we remove the whole knowledge subgraph, indicating that external knowledge is helpful in identifying spoilers.Moreover, it can be observed in Figure3 (b) that the F1-score and AUC only dropouts slightly when removing part of the edges in the knowledge subgraph.This illustrates the robustness of MVSD, as it can achieve relatively high performance while utilizing a subset of movie knowledge.

Ablation Study
In order to study the effect of different views of data, we remove them individually and evaluate variants of our proposed model on the Kaggle Dataset.We further remove some parts of the graph structure to investigate, Finally, we replace our attention mechanism with simple fusion methods to evaluate the effectiveness of our fusion method.

Multi-View Study
We report the binary F1-Score, AUC, and Acc of the ablation study in Table 4.Among the multi-view data, semantic view data is of great significance as AUC and F1-score drop dramatically when it is discarded.We can see that discarding the external knowledge view or removing the knowledge subgraph reduces the F1-score by about 3%, indicating that the external knowledge of movies is helpful to the spoiler detection task.However, external knowledge doesn't show the same importance as the directly related semantic view or meta view.We believe this is because the external knowledge is not directly related to review documents, so it can only provide auxiliary help to the spoiler detection task.
Graph Structure Study As illustrated in Table 4, after removing the user-review subgraph, the reduced model performs poorly, with a drop of 18% in F1.This demonstrates that the user interaction network is necessary for spoiler detection.

Aggregation and Interaction Study
In order to study the effectiveness of the hierarchical mechanism that enables the interaction between views and sub-graphs, we replace the two components of our hierarchical attention with other operations and evaluate them on the Kaggle Dataset.Specifically, we compare our attention module with concatenation, max-pooling, and average-pooling.
In Table 5 we report the binary F1-score, AUC, and Acc.We can see that our approach beats the eight variants in all metrics.It is evident that our approach can aggregate and fuse multi-view data more efficiently than simple fusion methods.

Qualitative Analysis
We conduct qualitative analysis to investigate the role of external movie knowledge and social networks for spoiler detection.As shown in Table 6, with the guide of external knowledge and user networks, MVSD successfully makes the correct prediction while baseline models fail.Specifically, in the first case, the user is a fan of Kristen Wiig.Guided by the information from the social network, MVSD finds that the user often posted spoilers related to the film star, and finally predicts that the review is a spoiler.In the second case, the user mentioned something done by the director of the movie.With the help of movie knowledge, it can be easily distinguished that what the director has done reveals nothing of the plot.

Related Work
Automatic spoiler detection aims to identify spoiler reviews in domains such as television (Boyd-Graber et al., 2013), books (Wan et al., 2019), and movies (Misra, 2019;Boyd-Graber et al., 2013).Existing spoiler detection models could be mainly categorized into two types: keyword matching and machine learning models.Keyword matching methods utilize predefined keywords to detect spoilers, for instance, the name of sports teams or sports events (Nakamura and Tanaka, 2007), or the name of actors (Golbeck, 2012).This type of method requires keywords defined by humans, and cannot be generalized to various application scenarios.Early neural spoiler detection models mainly leverage topic models or support vector machines with handcrafted features.Guo and Ramakrishnan (2010) use bag-of-words representation and LDA-based model to detect spoilers, Jeon et al. (2013) utilize SVM classification with four extracted features, while Boyd-Graber et al. ( 2013) incorporate lexical features and meta-data of the review subjects (e.g., movies and books) in an SVM classifier.Later approaches are increasingly neural methods: Chang et al. (2018) focus on modeling external genre information based on GRU and CNN, while Wan et al. (2019) introduce item-specificity and bias and utilizes bidirectional recurrent neural networks (bi-RNN) with Gated Recurrent Units (GRU).A recent work (Chang et al., 2021) leverages dependency relations between context words in sentences to capture the semantics using graph neural networks.
While existing approaches have made considerable progress for automatic spoiler detection, it was previously underexplored whether review text itself is sufficient for robust spoiler detection, or whether more information sources are required for better task grounding.In this work, we make the case for incorporating external film knowledge and user activities on movie review websites in spoiler detection, advancing the field through both resource curation and method innovation, presenting a largescale dataset LCS, an up-to-date movie knowledge base UKM, and a state-of-the-art spoiler detection approach MVSD.

Conclusion
We make the case for incorporating external knowledge and user networks on movie review web-sites for robust and well-grounded spoiler detection.Specifically, we curate LCS, the largest spoiler detection dataset to date; we construct UKM, an upto-date knowledge base of the film industry; we propose MVSD, a state-of-the-art spoiler detection system that takes external knowledge and user interactions into account.Extensive experiments demonstrate that MVSD achieves state-of-the-art performance on two datasets while showcasing the benefits of incorporating movie knowledge and user behavior in spoiler detection.We leave it for future work to further check the labels in the LCS dataset.

Ethics Statement
We envision MVSD as a pre-screening tool and not as an ultimate decision-maker.Though achieving the state-of-the-art, MVSD is still imperfect and needs to be used with care, in collaboration with human moderators to monitor or suspend suspicious movie reviews.Moreover, MVSD may inherit the biases of its constituents, since it is a combination of datasets and models.For instance, pretrained language models could encode undesirable social biases and stereotypes (Li et al., 2022;Nadeem et al., 2021).We leave to future work on how to incorporate the bias detection and mitigation techniques developed in ML research in spoiler detection systems.Given the nature of the task, the dataset contains potentially offensive language which should be taken into consideration.

A Graph-Based Social Text Analysis
Graphs and heterogeneous information networks are playing an important role in the analysis of texts and documents on news (Mehta et al., 2022) and social media (Hofmann et al., 2022).In these approaches, graphs and graph neural networks are adopted to represent and encode information in addition to textual content, such as social networks (Nguyen et al., 2020), external knowledge graphs (Zhang et al., 2022), social context (Mehta et al., 2022), and dependency relations between context words (Chang et al., 2021).With the help of additional information sources, these graph-based approaches enhance representation quality by capturing the rich social interactions (Nguyen et al., 2020), infusing knowledge reasoning into language representations (Zhang et al., 2022), and reinforcing nodes' representations interactively (Mehta et al., 2022).As a result, graph-based social text analysis approaches have advanced the state-of-theart on various tasks such as misinformation detection (Zhang et al., 2022), stance detection (Liang et al., 2022), propaganda detection (Vijayaraghavan and Vosoughi, 2022), sentiment analysis (Chen et al., 2022), and fact verification (Arana-Catania et al., 2022).Motivated by the success of existing graph-based models, we propose MVSD to incorporate external knowledge bases and user networks on movie review platforms through graphs and graph neural networks.

B Limitations
We identify two key limitations: • MVSD utilizes widely-adopted RGCN to model each subgraph, while there are more up-to-date heterogeneous graph algorithms like HGT (Hu et al., 2020), SimpleHGN (Lv et al., 2021).We plan to conduct experiments that replace RGCN with other heterogeneous graph algorithms.Besides, considering the subgraph structure of MVSD, we will test different heterogeneous graph algorithm settings in each subgraph to find out the most efficient algorithm for each subgraph.
• LCS is constructed based on IMDB, and the spoiler annotation is based on user self-report.Hence, it is likely that some label is false.In the next step of our work, we will check the labels with the help of experts and weak supervised learning strategy (Zhou, 2018).

C Heterogeneous Graph Construction Details
C.1 Movie-Review Subgraph N1: movie The information about movies, especially the plot, is essential in spoiler detection.We use one node to represent each movie.N2: rating Rating is an essential part of movie review.We use ten nodes to represent the numerical ratings ranging from 1 to 10. N3: review We use one node to represent each movie review document.
We connect these nodes with three types of edges, denoted as E M : R1: review-movie We connect a review node with a movie node if the review is about the movie.R2: movie-rating We connect a movie node with a rating node according to the overall rating of the movie, rounded to the nearest integer.R3: rating-review We connect a review node with a rating node based on its numeric score.

C.2 User-Review Subgraph
N4: review We use one node to represent each review document.Note that review nodes appear both in V M (as N1) and V U (as N4).Sharing nodes across subgraphs enables MVSD to model the interaction and exchange across different contexts.N5: user We use one node to represent each user.N6: year We use one node to represent each year, modeling the temporal distribution of spoilers.
We connect these nodes with three types of edges, denoted as E U : R4: review-user We connect a review node with a user node if the user posted the review.R5: review-year We connect a review node with a year node if the review was posted in that year.R6: user-year We connect a user node with a year node if the user created the account in that year.

C.3 Knowledge Subgraph
N7: movie We use one node to represent each movie.N8: genre We use one node to represent each movie genre.N9: cast We use one node to represent each distinct director and cast member.N10: year We use one node to represent each year.N11: rating We use ten nodes to represent the numerical ratings ranging from 1 to 10.
We connect these nodes with four types of edges:   R7: movie-genre We connect a movie node with a genre node according to the genre of the movie.R8: movie-cast We connect a movie node with a cast node if the cast is involved in the movie.R9: movie-year We connect a movie node with a year node if the movie was released in that year.R10: movie-rating We connect a user node with a rating node according to the rating of the movie.

D Dataset Details
We adopt two graph-based spoiler detection datasets, namely Kaggle (Misra, 2019)

D.1 Data Analysis
We compare LCS with another popular spoiler detection dataset Kaggle (Misra, 2019) and presents our findings in Figure 4. We investigate the correlation between spoilers and individual review scores, overall movie ratings, and the behavior of different users.Firstly, we investigate the correlation between spoilers and review scores.Figure 4(a) shows that whether a review containing spoilers has a strong connection with how well the user considers the movie.Additionally, we find that whether a review contains spoilers is also related to the public opinion of the movie, which is illustrated in Figure 4(b).These findings suggest the necessity of leveraging metadata and external knowledge of movies.In addition, we study the fraction of reviews containing spoilers per user.As illustrated in Figure 4(c), the 'spoiler tendency' varies greatly among users.This suggests that it is essential to utilize the user information and how they interact with different movies on review websites.

D.2 Metadata
The metadata we collected for both datasets is listed in table 9.

E KG Details
The types of relations, triples, and the number of them are presented in table 10.

F Experiment Details
Implementation.For pre-trained LMs, we utilize the pre-trained model to get the embeddings and transform them through MLPs.For DNSD and SpoilerNet, we follow the settings in their corresponding papers.For GNNs, we combined the three subgraphs into a whole graph and only utilize the semantic view embedding.We learn a representation for each review, and the representations are passed to an MLP for classification.

F.1 Baseline Details
We compare MVSD with pre-trained language models, GNN-based models, and task-specific baselines to ensure a holistic evaluation.For pretrained language models, we pass the review text to the model, average all tokens, and utilize two fully connected layers to conduct spoiler detection.
For GNN-based models, we pass the review text to RoBERTa, averaging all tokens to get the initial node feature.We provide a brief description of each of the baseline methods, in the following.
• BERT (Devlin et al., 2019) is a language model pre-trained on a large volume of natural language corpus with the masked language model and next sentence prediction objectives.• BART (Lewis et al., 2020) is a transformer encoder-decoder (seq2seq) language model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.
• DeBERTa (He et al., 2021b) improves existing language models using disentangled attention and enhanced mask decoder.
• GCN (Kipf and Welling, 2016) is short for graph convolutional networks, which enables parameterized message passing between neighbors.
• R-GCN (Schlichtkrull et al., 2018) extends GCN to enable the processing of relational networks.
• DNSD (Chang et al., 2018) is a spoiler detection framework using a CNN-based genre-aware attention mechanism.
• SpoilerNet (Wan et al., 2019) extends the hierarchical attention network (HAN) (Yang et al., 2016) with item-specificity information and item and user bias terms for spoiler detection.

F.2 Hyperparameter Details
We present our hyperparameter settings in Table 11 to facilitate reproduction.The setting for both datasets is the same.

F.3 Computational Resources
Our proposed approach has a total of 0.9M learnable parameters.It takes about 10 GPU hours to train our approach on the Kaggle dataset.We train our model on a Tesla V100 GPU.We conduct all experiments on a cluster with 4 Tesla V100 GPUs with 32 GB memory, 16 CPU cores, and 377GB CPU memory.

F.4 Experiment Runs
For both datasets that have relatively large scales, we adopt the subsampling skill proposed in (Hamilton et al., 2017), which has been successfully used on large graphs (Velickovic et al., 2019).We conduct our approach and baselines five times on both datasets and report the average F1-score, AUC, and accuracy with standard deviation in Table 3.For the experiments in table 4, table 5, and figure 3, we only report the single-run result in the Kaggle dataset due to the lack of computational resources.

F.5 Visualization
To intuitively demonstrate the effectiveness of our representation method, we utilize T-SNE (Van der Maaten and Hinton, 2008) to visualize the representations of movie reviews learned by different models.Specifically, we choose our proposed MVSD and R-GCN (with the second highest performance) and evaluate them on the validation set of the small dataset.It can be observed in Figure 5b that the learned representations of different kinds are relatively mixed together.In contrast, representations learned by MVSD show moderate collocation for both groups of reviews.This illustrates that MVSD yields improved and more comprehensive representation with the effective use of multi-view data and user interaction networks.

F.6 Contribution of Views and Subgraphs
We introduce semantic, meta, and external knowledge views and utilize user-review, movie-review, and knowledge subgraph structures to represent multi-information.To further study the contribution of different views and sub-graphs.We extract the attention weight from the View-level attention layers and Subgraph-level attention layers and illustrate them in violin plots.We select representative features and present them in Figure 6.The four violin plots demonstrate that our proposed hierarchical attention can select the more important features from the variation of attention weight between the first and the second layer, indicating that the contributions of certain representations are varied as they capture features via the graph structure and attention mechanism.

G Significance Testing
To further evaluate MVSD's performance on both datasets, we apply one way repeated measures ANOVA test for the results in Table 3.The result demonstrates that the performance gain of our proposed model is significant on both datasets against the second-best R-GCN on all three metrics with a confidence level of 0.05.

H Scientific Artifact Usage
The MVSD model is implemented with the help of many widely-adopted scientific artifacts, including PyTorch (Paszke et al., 2019), NumPy (Harris et al., 2020), transformers (Wolf et al., 2020), sklearn (Pedregosa et al., 2011), OpenKE (Han et al., 2018), PyTorch Geometric (Fey and Lenssen, 2019).We utilize data from IMDB and following the requirement of IMDB, we acknowledge the source of the data by including the following statement: Infor-mation courtesy of IMDb (https://www.imdb.com).Used with permission.Our use of IMDb data is non-commercial, which is allowed by IMDB.We will make our code and data publicly available to facilitate reproduction and further research.

Figure 1 :
Figure 1: An example of a movie review and its context.The review mentions Tim Robbins and MorganFreeman, which are the names of the actors.Guided by external movie knowledge, the names can be recognized as the roles in the movie.Moreover, by incorporating user networks, it is discovered that User 1 likes to post spoilers on some specific genres of movies such as drama and comedy.Thus the review is more likely to be a spoiler.

Figure 2 :
Figure 2: The architecture of MVSD, which incorporates external knowledge and social network interactions, leverages multi-view data and facilitates interaction between multi-view data.

Figure 3 :
Figure 3: MVSD performance when randomly removing the edges in the user interaction network and external knowledge subgraph.Performance declines with the gradual edge ablations, indicating the contribution of external knowledge and user networks.

Figure 4 :
Figure 4: (a) The spoiler frequency of reviews with different ratings; (b) The spoiler frequency of reviews related to movies of different ratings; (c) The percentage of spoilers per user, spoiler review percentage intervals are divided every 10 percent.

Figure 5 :
Figure 5: T-SNE visualization of representations of reviews learned by MVSD and R-GCN.

Figure 6 :
Figure6: Attention weights learned by our hierarchical attention.Subscript v, r indicate the public nodes movie and review separately.T , M , and K refer to the textual view, the meta view, and the external knowledge view, respectively.This violin plot illustrates the different contributions of each view and subgraph and the process of interaction.

Table 1 :
Statistics of LCS and existing dataset Kaggle.

Table 2 :
Statistics of UKM and existing movie KBs.

Some birds aren't meant to be caged.
3 Details and statistics of the LCS datasets are presented in Appendix D. review X The Shawshank Redemption is written and directed by Frank Darabont.It is an adaptation of the Stephen King novella Rita Hayworth and Shawshank Redemption.Starring Tim Robbins and Morgan Freeman, the film portrays the story of Andy Dufresne ... Title: knowledge view ...

Table 4 :
Ablation study concerning multi-view data and the graph structure on Kaggle Dataset.The semantic view, knowledge view, and meta view are denoted as S, K, and M respectively.The knowledge subgraph, movie-review subgraph, and user-review subgraph are denoted as G K , G M and G U .

Table 5 :
Model performance on Kaggle when our attention mechanism is replaced with simple fusion methods.

Table 7 :
Statistics of our proposed LCS dataset.

Table 8 :
Statistics of the Kaggle Dataset.
and our curated LCS.The two datasets are both in English.The publicly available Kaggle dataset only provides incomplete information.Hence, we retrieved cast information based on the movie ids and collected user metadata based on user ids.The statics details of Kaggle after retrieving are listed in table 8, and the statics details of our LCS are listed in table 7.

Table 9 :
Details of metadata contained in the dataset.