STINMatch: Semi-Supervised Semantic-Topological Iteration Network for Financial Risk Detection via News Label Diffusion

Commercial news provide rich semantics and timely information for automated financial risk detection. However, unaffordable large-scale annotation as well as training data sparseness barrier the full exploitation of commercial news in risk detection. To address this problem, we propose a semi-supervised Semantic-Topological Iteration Network , STINMatch, along with a News-Enterprise Knowledge Graph (NEKG) to endorse the risk detection enhancement. The proposed model incorporates a label-correlation matrix and interactive consistency regularization techniques into the iterative joint learning framework of text and graph modules. The carefully designed framework takes full advantage of the labeled and unlabeled data as well as their interrelations, enabling deep label diffusion coordination be-tween article-level semantics and label correlations following the topological structure. Extensive experiments demonstrate the superior effectiveness and generalization ability of STIN-Match 1 .


Introduction
Financial risk detection for enterprises is an essential task to assess and estimate the dynamic fragility of the market.Efforts need to be made to probe the vulnerable enterprises and enable timely preparedness.Traditional methods often treat each enterprise individually and leverage the official information or relevant structured data from government agencies to assess risk (Ozbayoglu et al., 2020).However, these official data are often biased and lagged, making it difficult to identify risks accurately and timely (Bi et al., 2022).Commercial news mining offers another effective perspective for financial risk detection owning to the mas-1 https://github.com/curryli/Semi-Supervised-Financial-Risk-Detection.gitsive and timely information embedded in news articles (Walker, 2016;Calomiris and Mamaysky, 2019;Li et al., 2022).Nevertheless, challenges are persisting on how to efficiently utilize news for detecting financial risks for enterprises.
One of the key issues is the multi-label diffusion problem.One news document may carry multiple risk labels (e.g., a 'debt risk' can be accompanied by 'litigation threat'), and conventional methods are difficult to handle the mutual influences among these labels given the rapid growth and variety of streaming media.Recently, deep learning methods have achieved great success in the field of natural language processing.Deep multi-label text classification (MLTC) methods can be applied to explore the label correlations (Liu et al., 2017;Yang et al., 2018).Unfortunately, pure text-based methods cannot handle risk diffusion in the business ecosystem.In order to address this problem, we propose a new model, semantic-topological iteration network (STIN), to estimate the 'risk diffusion' on the established news-enterprise knowledge graph (NEKG).Unlike previous graph neural network (GNN) methods for text analysis (Yang et al., 2021b,a;Pang et al., 2022;Zhao et al., 2023), our STIN model focus on the multi-label-correlation guided text-graph joint learning, hoping to capture the dissemination for various types of financial risks following the NEKG topological structure.
Another challenge is the limited annotation data of financial news due to domain expert scarcity or expensive labor cost.Semi-supervised learning (SSL) is a common method for solving data scarcity problems.Many SSL methods based on entropy minimization, consistency regularization or generic regularization have been proposed for low-resource analysis scenarios (Berthelot et al., 2019(Berthelot et al., , 2020;;Sohn et al., 2020).However, few studies have been carried out on the semi-supervised integration for text-graph joint learning framework, which could improve the performance for both modules by leveraging the unlabeled data more efficiently in scenarios similar to our task.
The contributions are summarized as follows: • We propose a pioneer semi-supervised text-graph joint learning framework STINMatch.It fully exploits the semantic information and topological association for risk diffusion with limited annotation data.
• A novel content-label-topology aggregation mechanism is further introduced during the iteration of STIN model to handle the multi-label diffusion issues on text-attributed graphs.
• We release an NEKG dataset annotated with multiple financial risks, which leverages real-world enterprise relatedness and news-enterprise associations for risk detection.
• Extensive experiments demonstrate the detection effectiveness of STINMatch and its good generalization ability on NEKG dataset, as well as other two public datasets.

Related Work
Financial Risk Detection.Classification and regression algorithms as well as time series forecasting have been widely used in financial risk detection (Ozbayoglu et al., 2020;Sezer et al., 2020).However, such methods mainly rely on historical, structured data from corporate or government agencies which lack up-to-date information.Recently, unstructured textual data, such as business management reports and financial news, are adopted for financial risk detection due to richer information and better timeliness (Peng and Yan, 2021;Li et al., 2022).However, such methods overlook the interactions between news and enterprises for risk diffusion by simply leveraging sentiment analysis on each isolated document.A recent work (Bi et al., 2022) leverages financial news as intermediaries between enterprises to exploit their interactions, but textual contents of the news are neglected.Label Diffusion.Label correlations (Kurata et al., 2016;Yang et al., 2018;Zhang et al., 2021) are widely employed to improve model performance for MLTC tasks.GNN methods have also been used to deal with label diffusion issues for text analysis.Most of them apply GNN on the extracted word/entity-level knowledge graph (Yang et al., 2021b) or label co-occurrence graph (Pal et al., 2020) to enrich representation for each independent sample.Other GNN methods utilize text-attributed node relatedness (Kipf and Welling, 2017;Alkhereyf and Rambow, 2020) to enhance node representation, but the text representations are fixed during training.Some recent works are combining GNNs with text classifiers to take advantage of both topology and semantic modeling.For example, GLEM (Zhao et al., 2023) proposes a variational expectation maximization framework to alternatively updates the text and graph modules separately.Nevertheless, different from previous works, our STINMatch method focuses on the semisupervised integration for text-graph joint learning framework, as well as the multi-label diffusion upon typologies for text-attributed GNN works.Consistency Regularization for SSL.Consistency regularization is a popular SSL approach to constrain model predictions being invariant to input noise.MixMatch (Berthelot et al., 2019) applies data augmentation techniques and introduces a unified loss for unlabeled data that seamlessly reduces entropy while maintaining prediction consistency.The modified versions such as ReMixMatch (Berthelot et al., 2020) and UDA (Xie et al., 2020) both use weakly-augmented examples to generate artificial labels and enforce consistency against stronglyaugmented examples.FixMatch (Sohn et al., 2020) is a simplified version of ReMixMatch and UDA, which combines the pseudo-labeling with consistency regularization while removing many specified components (e.g., training signal annealing and distribution alignment).However, all these methods focus on SSL within a single text or graph module respectively and could not be trivially adapted to our joint learning framework for text-attributed graphs.Other related work related to classification for subjective texts in different granularities include (Xiao et al., 2019;Moon et al., 2021;Song et al., 2023).

Preliminary
In this section, we introduce the task goal of STIN-Match, detail the NEKG construction, and provide an intuitive example for financial risk diffusion.Semi-supervised Risk Diffusion.Given a set of news X and risk labels Y ∈ {0, 1} K , the dataset where N is a vertex set denoting news, C is a vertex set denoting enterprises, and E = {e|e = (p, q), p ∈ C, q ∈ C} {e|e = (p, q), p ∈ N, q ∈ C} is an undirected edge set.Among N , each news in D L is annotated with K binary risk labels, represented by a K-hot vector.Resource and Statistics.Specifically, our NEKG contains 99,666 news nodes and 50,193 enterprise nodes.Each news node is initialized by its title and content, and each enterprise node is initialized by the company name.The edge between a news node and an enterprise node indicates that the enterprise is mentioned in the content of the news.The edges between different enterprises belong to five realworld relationship types: subsidiary, investment, share-manager, share-investor, and share-legalentity.In total, NEKG contains 135,340 newsenterprise edges and 121,938 enterprise-enterprise edges.Annotation.We sample |D L | = 15, 000 news from all news data, and D L is annotated by three domain experts.Each news can be identified as correlating to one or more financial risks from the following labels: Bankruptcy, Liquidation, Business closure, Production halts, Debt, Corruption, Dispute, Counterfeit, Fraud, and Litigation.The annotation standard is summarized through three preliminary rounds of annotation with 500 pieces of news.After the adjustment through preliminary rounds, each annotator labels D L independently in more than a month's time, and the annotation results achieve a 0.803 Fleiss's kappa.4 STINMatch: Methodology

Overview
Architecture.We propose an end-to-end semisupervised semantic-topological iteration network to endorse multiple risk diffusion.As illustrated in Fig. 2, STINMatch is composed of a text classifier (i.e., M t ), a GNN-based node classifier (i.e., M g ), and a label-correlation matrix R. M t contains a base text encoder and several classification layers.It takes textual inputs and learns text embeddings through classification objectives on labeled news.The hidden representations and predictions of M t are utilized to initialize node features of M g at each iteration round.M g propagates risk information from labeled samples to unlabeled samples through NEKG topology to achieve risk diffusion and boost risk detection.We iteratively train M t and M g in turn until convergence, and we seek to optimize the integration of the correlation matrix R into an iterative joint learning framework in order to maximize performance and efficiency.Upon the diffusion model reaching convergence, the enterprise risk labels are adopted as signals for precisely quantifying risk evaluation.Our work advocates for an innovative approach to financial risk detection that integrates label correlation effectively into the text-graph iterative learning process.The integration of label correlation into the iterative learning process offers a mutual benefit, significantly improving the overall performance of the system.On the one hand, the label correlation, serving as domain-specific knowledge, plays an important role to guide the label diffusion for both text and graph modules in each iteration; on the other hand, the predictions of the successively enhanced model help involve more previously unlabeled samples into the calculation of label correlation, resulting in a more generalized label correlation in each iteration round.
Iterative Learning Framework.Below we elaborate on the iterative semi-supervised learning framework and the workflow from one iteration to the next iteration.We ignore the subscription denoting the iteration round for readability.
For a certain iteration round, we denote the key information layer in M t as h t .h t carries two embeddings: text semantic embedding x t , and graph context embedding x g from the last iteration.M t performs data augmentation techniques on both labeled and unlabeled data to compute supervised and unsupervised loss functions for updating model parameters.We detail the module components and semi-supervised learning strategy for M t in section 4.2.
After retraining M t , we take the value of its representation layer ψ t as the learned text representation to initialize the node representations h g of M g in this iteration round.The M t 's prediction ℓ t := (p 1 , p 2 , . . ., p K ) T will also participate in the calculation of graph aggregation process.Similarly, we train M g with a joint loss function, consisting of a supervised loss and an unsupervised loss, and take the values of its last hidden layer ψ g as learned node representations.Then we use ψ g to reinitialize the x g of M t for the next iteration round.We detail the semi-supervised learning of M g and data filtering strategy in section 4.3.
With the retrained M t and M g , STINMatch filters ∆n confidently-predicted samples from m unlabeled samples to update the risk label correlation matrix R ∈ R K×K as R i,j = cos⟨(y 1i , y 2i , . . ., y N i ), (y 1j , y 2j , . . .y N j )⟩, where N = n + ∆n is the number of samples for the joint set of D L and the filtered set, and (y 1i , y 2i , . . ., y N i ) is the vector consisting of the i-th risk labels for the N samples.R is initially calculated from the labeled set D L and updated in each iteration round.Fig. 3 shows a visualized calculation process for R.

Semi-Supervised Text Model
Text semantic embedding x t for M t comes from a text encoder (e.g.CNN, RNN or BERT-based).Graph context embedding x g is randomly initialized for all nodes and can be obtained from ψ g of graph model M g in the following iterations.
Similar to (Kurata et al., 2016), we involve a weight initialization strategy leveraging label co-occurrence to improve the model performance for MLTC task.Let l −1 t denote the second-last layer in M t .The text model makes predictions as For the k-th independent risk label, the supervised loss on the batch of labeled data is: Here p i,k is the probability of the i-th sample being predicted with the k-th risk label by M t .Let U = {u b : b ∈ (1, . . ., B ′ )} denote a batch of unlabeled samples.STINMatch applies different augmentation techniques on the unsupervised data U to generate a set of weak augmentation data U w t and a set of strong augmentation data U s t , by manipulating the primary semantic representation x t and the supplementary graph context representation x g for M t .Let l w t = (p w 1 , p w 2 , . . ., p w K ) T and l s t = (p s 1 , p s 2 , . . ., p s K ) T denote the predictions of M t on U w t and U s t , respectively.To generate l w t , the key idea is only to disturb the supplementary graph context embedding x g for weak augmentation.We apply the Random Perturbation method (Kumar et al., 2019) on x g and combine it with the original x t as the input of F C 1 .For l s t , we applied a relative strong augmentation method Extrapolation (Kumar et al., 2019) on both x t and x g , which utilizes the differences from other samples to synthesize new examples.Here, the weak augmentation provides higher accuracy for the pseudo labels, while strong augmentation provides better diversity and a larger region of sample perturbation for the consistency regularization, thereby improving the performance of the semi-supervised learning.
For each p w k , we calculate a pseudo-label with an indicator function pw k = 1[p w k > τ ] which returns 1 when p w k > τ else 0, where τ is a threshold hyperparameter.Let P w = {p w k |k = 1, 2, . . ., K}.The unsupervised loss for k-th risk label on the batch of unlabeled data is defined as: where 1[ K k=1 pw i,k > 1] is an indicator function for verifying the validity of the predictions on the ith augmented sample and L u t,k,i is the cross-entropy loss on k-th label for the i-th unlabeled sample: . Finally, we merge the supervised and unsupervised loss of all K labels for training M t : , where γ is a hyperparameter.

Semi-Supervised Graph Model
Content-Label-Topology Aggregation.The aggregation mechanism in M g is dominated by a Semantic-representation and Label-distribution guided Attention (SLA) layer.News semantics, label correlations, and risk diffusion can be learned jointly via NEKG by stacking SLA layers.The following part introduces the forward calculation from the input node feature set h l g of the l-th SLA layer to that of the next layer h l+1 g .Below we omit the subscription denoting the graph model g and the number of layer l for readability.For each node µ, the input and output node features for SLA layer are denoted as ⃗ h µ and ⃗ h ′ µ , respectively.Specifically, we first apply a multi-head semantic-similarity-based attention mechanism similar to (Veličković et al., 2017).It learns J independent semantic attention weights to stabilize the learning process, where the j-th single-head weight between node µ and ν is calculated as: where W j is a shared transformation matrix, a j is a shared feed-forward neural network for each layer, and || is the concatenation operation.Then we involve a label-similarity-based attention mechanism since there exist internal relations among different labels.However, direct similarity calculation between multi-hot label representations neglects such correlations.To address this issue, we utilize the correlation matrix R to capture the internal relations among different labels to enable cross-label similarity calculation.As shown in Fig. 3, without considering label correlations, the aggregation weight W (n 3 → c 1 ) is equal to W (n 4 → c 1 ).While considering that Bankruptcy and Liquidation are correlated risk labels, W (n 3 → c 1 ) becomes larger than W (n 4 → c 1 ).The label attention between node µ and node ν is: Note that ℓ t,µ and ℓ T t,ν come from the predictions of M t , ⊙ represents element-wise production, and ||.|| Frobenius represents frobenius norm for matrix.
After obtaining both η µ,ν and β µ,ν , we combine them into a merged attention weight α.The j-th merged attention part of node µ with respect to node ν is the softmax of a linear combination: , where N µ indicates the neighborhoods include itself for node µ, and λ ∈ R J is a trainable vector.At last, the SLA layer outputs the feature representation for node µ as the concatenation of J independent transformations: Here, σ is the sigmoid function.Note that the input feature for the first SLA layer is initialized from the learned text representation ψ t in this iteration round, and we employ the averaging operation instead of concatenation for the J head outputs on the final (prediction) layer as in (Veličković et al., 2017).
Consistency Regularization for M g .Similar to M t , the supervised loss for the k-th risk label to train M g on the batch of labeled data is: where q i,k is the probability for the i-th sample being predicted by M g as the k-th risk label.Also, STINMatch applies data augmentations on the unsupervised data U to generate a set of weak augmentation data U w g and a set of strong augmentation data U s g for M g .Let l w g = (q w 1 , q w 2 , . . ., q w K ) T and l s g = (q s 1 , q s 2 , . . ., q s K ) T denote the predictions of M g on U w g and U s g , respectively.For l w g , we utilize the graph attention dropout on the attention coefficients in (Veličković et al., 2017) at inference phases.The weak augmentation only changes the aggregation pattern without disturbing the input node features.For strong augmented l s g , besides the graph attention dropout, we further apply the Extrapolation augmentation method (Kumar et al., 2019) on the input node feature set h g of the graph model.
For each q w k , we also calculate a pseudo-label as qw k = 1[q w k > τ ], and let Q w = {q w k |k = 1, 2, . . ., K}.To ensure the validity of the predictions from M g , we introduce an Elevated Constraint by restricting that the additional information from neighbors do not reduce the risk labels obtained from text model M t for each node itself.Namely only the samples whose pseudo-labels predicted from M t being a subset of that from graph module M g will participate in loss calculation, and the unsupervised loss for the k-th risk label to train M g on the batch of unlabeled data is defined as: Here ∧ represents the simultaneous satisfaction for both conditions.L u g,k,i is the cross-entropy loss for the i-th unlabeled sample, regarding qw k as the label and q s k as the prediction: . Finally, we merge the supervised and unsupervised loss of all K labels for training M g : 5 Experiments

Experimental Setting
Datasets.We validate our model on three different datasets.The main dataset NEKG are described in section 3. Note that we fixed a 5000 original labeled news as test set.We take only a random part of the remaining original labeled news as our labeled data set D L for each experimental setting.We remove the labels except for D L during training STINMatch.The default labeled size for D L is set to 1000, unless otherwise stated.The enterpriserelated information comes from a subset of the data integrated by our data center team collected from various of government-backed open data providers such as National Enterprise Credit Infomation Public System of China, and it enables the designed web-crawler to collect the corresponding news for annotations and evaluations.
The other two public datasets RentTheRunWay and Goodreads-Spoiler will be described in the algorithm generalization part of section 5.4.Implementation We carried out all models with Pytorch.Graph model is implemented using dis-tributed graphics library (DGL).All models are trained on the NVIDIA Tesla A100 80GB GPU.The hyper-parameter details are shown in Appendix A.

Baselines
As shown in Table 1, the STINMatch model is compared with baselines from 4 main categories.Category 1 contains classical text classification methods.RoBERTa-sw represents the RoBERTa trained using the sliding-window method (Wang et al., 2019).In order to validate the performance of different frameworks over pure text level fairly, most of the text encoder part in following categories are based on the TextCNN over pre-trained BERT embedding layers (represented by TCB).Category 2 contains state-of-the-art text models specialized for MLTC tasks such as LCNNI (Kurata et al., 2016), SGM (Yang et al., 2018) and CORE (Zhang et al., 2021).Focal or DB loss (Huang et al., 2021) designed to handle the label distribution for MLTC tasks from loss terms are also tested.Category 3 contains typical semi-supervised learning methods for pure text models.Category 4 applies typical GNN-based methods over TCB representations.General GNN models such as GCN (Kipf and Welling, 2017), Graph-SAGE (Hamilton et al., 2017), GAT (Veličković et al., 2017) are used.GNN models specialized for MLTC tasks such as MAGNET (Pal et al., 2020) and LC-GAT (Xu et al., 2020) are also included for comparison.GLEM (Zhao et al., 2023) is a latest text-graph co-training method, which has also been tested.Some details of GLEM have been modified to adapt to our scenario for fair comparison (e.g. using TCB as base language module, replacing the binary cross-entropy loss with a multi-label loss).

Effectiveness for STINMatch
News Risk Evaluation.Experiment results are reported in Table 1 with the labeled size n set to 1000.From Category 2 of Table 1 one can witness slight improvements with some multi-label methods, indicating the usefulness of correlations among labels.Text-based semi-supervised methods also slightly enhance the performances.GNN methods achieve certain improvements compared to that of pure text-based methods, showing the effectiveness of the additional NEKG.The proposed STINMatch model shows convincing superiority compared to all baselines.It is owing to the multilabel-correlation guided text-graph joint learning, as well as the interactive semi-supervised learning across text-graph models.Note that we run the experiments for 5 times and reported the average performances.The t-test results show that the proposed model significantly outperforms the best baseline by 4.7%.The detailed ablation study results for STINMatch are described in Section 5.4.Enterprise Risk Evaluation.We also conducted comparative experiments for evaluating enterprise risk detection based on the label-diffusion results from NEKG.In fact, each enterprise node has a credit risk rating label provided by an authoritative rating agency.The credit risk rating labels can be divided into 2 different levels according to the rating scores: high-risk (1) and low-risk (including those with no risk) (0).On the other hand, the GNN-based models will also offer risk predictions for each enterprise node when converged.To make the two label systems comparable, we suppose enterprise nodes with more than one predicted risk labels from graph models are high-risk (1), while others are low-risk (0).We take the credit risk rating labels as ground truth, and calculate the classifier metrics (Accuracy and F1 for lowrisk/high-risk class) according to the predictions from different GNN-based models.
By comparing the results of Table 1 and 2, we can find that the model performance of the enterprise risk classifier is nearly positively correlated with that on MLTC task for news risk detection.The performance of STINMatch method exceeds all that from other GNN-based baselines on enterprise risk evaluation task.

Analysis for STINMatch
Ablation Study.To investigate the contribution for each component in STINMatch, we conducted 6 different ablation studies.The first study was trained by patient epochs for text and graph module respectively with early stopping in a single round without iteration.The second study only initialized the label correlation matrix R once without the updating mechanism.The third study only considered the traditional semantic attention and ignored the label attention β for M g .The following studies explored different components for the semi-supervised learning.
From Table 3 we can find that every intentionally removed component leads to a decrease in the performance of the STINMatch model.It indicates that the well-trained semantic representation and graph-based diffusion model can perceive important reciprocal advantages from each other by leveraging the multi-label correlations.The two descriptions "w/o unsupervised loss" and "w/o unsupervised loss" indicate the results after removing the semi-supervised techniques from the text model and the graph model.It indicate that the interactive semi-supervised learning across text and graph modules indeed make a better utilization for unlabeled data at different training stages.Moreover, the superiority for the text-graph joint learning can be indicated by compare the results of category 3 from Table 1, which represent the semi-supervised- Algorithm Generalization We also processed two public datasets RentTheRunWay (Misra et al., 2018) and Goodreads-Spoilers (Wan et al., 2019) to demonstrate the generalization ability of our STIN-Match method when adapted to other domains.We elaborate on the two review datasets in Appendix C. For these two datasets, we fixed half of the samples as test set, and randomly selected 10% of the remaining samples as labeled ones for semisupervised learning comparison.We selected four typical baselines described in section 5.2.From Ta-

Conclusion
In this paper, we introduce the NEKG for helping detect financial risks from commercial news.The proposed STINMatch method outperforms existing state-of-the-art models on news risk detection task, as well as the downstream enterprise risk evaluation task.Such improvements mainly come from 1) Additional NEKG topology enables article-level risk diffusion; 2) The carefully designed STIN model brings deep interactions among semantic module, topological module and label-correlation matrix, enhancing the sound diffusion upon NEKG.
3) The innovative semi-supervised joint learning framework enables text and graph modules to perceive important reciprocal advantages from each other, making both modules utilize unlabeled data more effectively.The utility of real-world applications in scenarios similar to our financial risk detection task is substantial, ranging from multi-label document classification based on citation or social networks, to multi-label product classification through e-Commerce platforms.We also apply the STINMatch method on two public graph-enhanced MLTC datasets and validate its good generalization ability.

Limitations
STINMatch leverages semi-supervised learning techniques on a semantic-topological iteration network.Our research demonstrates that both the joint iterative learning and interactive consistency regularization of the text and graph models benefit the risk diffusion.However, we acknowledge that the proposed mechanisms increase the computation cost.For example, with the same training epochs and batch size, the computation cost of STINMatch is nearly three times the simplest baseline (i.e., TCB).However, this higher cost is justified by the significant benefits it brings.In fact, STINMatch outperforms TCB by 9.5% in terms of Macro F1, which is a substantial improvement.Moreover, in order for TCB to achieve an evenlymatched performance, it needs almost 10 times labeled data than STINMatch.The overall results indicate that our STINMatch offers better results at a certain cost of consumption time.

A Hyperparameters
For all experiments, hyper-parameters are determined by grid search, and total maximum epochs are set to the same order with early stopping to be fair.The learning rate is is 2e-4 for the text model and 5e-4 for the graph model.The batch size B is 8, B ′ is 7 times greater than B, which is the same to (Sohn et al., 2020).The layers for graph model is 3.The confidence threshold parameter τ is set to 0.95.The balance weight parameter γ for unsupervised and supervised loss is set to 0.5, the details can be seen in Table 5.The max content length in review datasets is 512, and the max length for each news in NEKG is set to 2000.For most experimental settings, we use a CNN-based text encoder with a pre-trained BERT embedding layer considering the length limitation and efficiency issues for fair comparison.Each text-graph iteration includes 100 text epochs and 100 graph epochs, the maximum of iterations is set to 10.All indicators are taken from the average value of last five epochs to ensure stability.

B NEKG Datasets
The distribution of each risk label among all annotated news the details can be seen in Table 6.Besides, we also report the circumstances of multilabel annotation of the news data in Table 7.Among all 15,000 annotated news, 5855 news are associated with 1 risk label, 6986 news are associated with 2 risk labels, and more than two thousand news have 3 or more labels.

C Review Datasets
RentTheRunWay (Misra et al., 2018) dataset contains self-reported fit feedback from customers as well as other side information like reviews, ratings, product categories, etc.It contains 105,508 users, 5,850 items and 192,544 reviews.We construct a bipartite graph of item nodes and review nodes.Item nodes are medium nodes similar to the enterprise nodes in our enterprise graph, and the review nodes are text-attributed nodes similar to our news nodes.One item is connected to another item by an edge if they are commented by a same user.We transformed the three-class fit-feedback tags and ten-class rating tags into the multi-labels.Goodreads-Spoilers (Wan et al., 2019) dataset contains reviews from the Goodreads book review website.It contains 25,475 items, 18,892 users and 1,378,033 reviews.The constructing method of graph is similar to that of RentTheRunWay.We transformed the six-class rating tags and the "hasspoiler" tag into the multi-labels.

D Annotation Table
Finally, we provide a detailed annotation table for interpreting all symbols in Table 8.
, n} and m unlabeled news D U = {x i |i = 1, 2, . . ., m}.The risk diffusion model aims to learn the mapping F : X → Y from news to multiple financial risks.NEKG Construction.The news-enterprise knowledge base is formulated as a graph G = (N, C, E),

Figure 2 :
Figure 2: (a) shows the overall semi-supervised learning framework of STINMatch.(b) shows the deep interactions between text module, graph module and label correlations during the iteration for STIN model.
where R is the risk label correlation matrix, ⊙ represents element-wise production, and W t and b t are learnable parameters.Let X = {(x b , y b ) : b ∈ (1, . . ., B)} denote a batch of labeled samples, where x b are training examples and y b are multi-hot labels.

Figure 3 :
Figure 3: (a) shows the calculation process and a visualization of label correlation matrix R. (b) is an example showing how does R affect the risk label diffusion.

Figure 4 :
Figure 4: The left insets (a) and (b) show the variation trend of the time cost and model performance with the increase of training epochs.The right insets (c) and (d) show the variation trend on the best Macro F1 with increasing the labeled data size.

Table 1 :
Comparisons on news risk detection task between our STINMatch model and other methods.† indicates that a pre-trained BERT embedding layer serves as the first layer of the model.

Table 2 :
Evaluation for enterprise risk detection based on risk propagation results from different GNN models.

Table 3 :
Ablation results for STINMatch method.w/o means without using corresponding component.

Table 5 :
Performance under the balance weight parameter γ at different scales for NEKG dataset.

Table 6 :
Distribution of risk labels for NEKG dataset.

Table 7 :
Risk label count for news.