GRENADE: Graph-Centric Language Model for Self-Supervised Representation Learning on Text-Attributed Graphs

Self-supervised representation learning on text-attributed graphs, which aims to create expressive and generalizable representations for various downstream tasks, has received increasing research attention lately. However, existing methods either struggle to capture the full extent of structural context information or rely on task-specific training labels, which largely hampers their effectiveness and generalizability in practice. To solve the problem of self-supervised representation learning on text-attributed graphs, we develop a novel Graph-Centric Language model -- GRENADE. Specifically, GRENADE exploits the synergistic effect of both pre-trained language model and graph neural network by optimizing with two specialized self-supervised learning algorithms: graph-centric contrastive learning and graph-centric knowledge alignment. The proposed graph-centric self-supervised learning algorithms effectively help GRENADE to capture informative textual semantics as well as structural context information on text-attributed graphs. Through extensive experiments, GRENADE shows its superiority over state-of-the-art methods. Implementation is available at \url{https://github.com/bigheiniu/GRENADE}.


Introduction
Text-Attributed Graph (TAG) (Yang et al., 2021) (a.k.a., Textual Graph) has been widely used for modeling a variety of real-world applications, such as information retrieval (Cohan et al., 2020;Yang et al., 2021), product recommendation (Zhu et al., 2021) and many more.In TAG, each node represents a text document, while the relationships among these text nodes are depicted by the edges.For instance, in citation networks, text nodes represent academic papers, and edges are the citation relationship between different papers.To conduct different analytics tasks on TAG, the key is to learn expressive node representations for the text nodes.
Recent research has demonstrated that selfsupervised learning (SSL) can substantially improve the effectiveness of representation learning on text data (Reimers and Gurevych, 2019;Gao et al., 2021;Wu et al., 2020) without using human supervision.Those methods are commonly learned under the assumption that text documents are independently and identically distributed (i.i.d.), which neglects the structural interdependencies among text nodes on TAG.However, the interdependencies between different text documents can provide valuable insights for understanding their semantic relationships.Take citation networks as an example, those academic papers (text nodes) that have citation relationships often share similar topics.Hence, it is necessary for SSL models to account for not only textual semantics but also structural context information.
In fact, self-supervised representation learning on TAG remains in its infancy: (i) Though recent research efforts (Zhao et al., 2022;Chien et al., 2021;Cohan et al., 2020;Yasunaga et al., 2022) try to empower pre-trained language models (PLM) with structural context information, most of them still stay superficial by designing local structuredependent SSL objectives.For example, both GIANT (Chien et al., 2021) and SPECTER (Cohan et al., 2020) train the language model by inferring the local neighborhood based on representations of text nodes.However, simply relying on those SSL objectives cannot help the PLM fully understand complex graph structures, especially compared to models like graph neural networks (GNN) (Kipf and Welling, 2017;Velickovic et al., 2018;Hamilton et al., 2017;Ding et al., 2022a); (ii) Meanwhile, another line of research (Mavromatis et al., 2023;Zhao et al., 2022) try to combine the advantages of both PLM and GNN by distilling the knowledge from one to the other (Hinton et al., 2015) and have shown promising results.Nonetheless, one major issue is that those methods are task-specific (e.g., semi-supervised node classification) and require human-annotated labels to enable knowledge distillation.Such an inherent limitation jeopardizes the versatility of their models for handling different and even unseen downstream tasks, which runs counter to the goal of SSL.
To go beyond the existing learning paradigms and capture informative textual semantic and graph structure information, we develop a new model for self-supervised learning on TAG, namely Grenade (Graph-Centric Language Model).Grenade is built with a PLM encoder along with an adjuvant GNN encoder that provides complementary knowledge for it.More importantly, Grenade is learned through two new self-supervised learning algorithms: Graph-Centric Contrastive Learning (GC-CL), a structure-aware and augmentation-free contrastive learning algorithm that improves the representation expressiveness by leveraging the inherent graph neighborhood information; and Graph-Centric Knowledge Alignment (GC-KA), which enables the PLM and GNN modules to reinforce each other by aligning their learned knowledge encoded in the text node representations.Specifically, GC-CL enforces neighboring nodes to share similar semantics in the latent space by considering them as positive pairs.Even without using data augmentation, GC-CL performs node-wise contrastive learning to elicit the structural context information from TAG.In the meantime, GC-KA bridges the knowledge gap between PLM and GNN by performing dual-level knowledge alignment on the computed representations: at the node level, we minimize the distance between the representations learned from two encoders that focus on different modalities.At the neighborhood level, we minimize the discrepancy between two neighborhood similarity distributions computed from PLM and GNN.By virtue of the two proposed graph-centric selfsupervised learning algorithms, we are able to learn Grenade that can generate expressive and generalizable representations for various downstream tasks without using any human supervision.In summary, our work has the following contributions: • We develop Grenade, which is a graph-centric language model that addresses the underexplored problem of self-supervised learning on TAG.• We propose two new self-supervised learning algorithms for TAG, which allow us to perform contrastive learning and knowledge alignment in a graph-centric way.
• We conduct extensive experiments to show that our model Grenade significantly and consistently outperforms state-of-the-art methods on a wide spectrum of downstream tasks.

Problem Definition
Notations.We utilize bold lowercase letters such as d to represent vectors, bold capital letters like W to denote matrices and calligraphic capital letters like W to represent sets.Let G = (A, D) denote a text-attributed graph with adjacency matrix A ∈ {0, 1} |D|×|D| and text set D. The A ij = 1 when there is a connection between node i and j.Each node i represents a text document which consists of a sequence of tokens . Problem 1 Given an input text-attributed graph (TAG) denoted as G=(A, D), our goal is to learn a graph-centric language model PLM(•), that can generate expressive and generalizable representation for an arbitray node i on G: Note that the whole learning process is performed solely on the input graph G without the utilization of human-annotated labels.

Proposed Approach: Graph-Centric Language Model (Grenade)
To learn expressive representations from TAG in a self-supervised learning manner, we propose our Graph-Centric Language Model Grenade, which bridges the knowledge gap between Pre-trained Language Model (PLM) and Graph Neural Network (GNN).By optimizing two distinct encoders with a set of novel self-supervised learning algorithms, the PLM encoder and GNN encoder mutually reinforce each other, and we can finally derive our Graph-Centric Language Model (Grenade).GNN.The overall framework is shown in Fig. 1.

Model Architecture
Our proposed model Grenade is composed of a Pretrained Language Model (PLM) along with a Graph Neural Network (GNN), which are optimized by a set of novel self-supervised learning algorithms.We first introduce the details about those two essential components as follows: PLM Encoder.The primary component PLM(•) is a BERT (Devlin et al., 2018) based text encoder that projects a sequence of tokens D i into a vectorized text node representation d i : where d i is the hidden representation of the [CLS] token computed from the last layer of the PLM encoder.
GNN Encoder.As an adjuvant component, the GNN encoder GNN(•) is built with a stack of messagepassing based GNN layers, which compute the node i's representation by iteratively aggregating and transforming the feature information from its neighborhood (Hamilton et al., 2017).For each node i, its representation learned from a L-layer GNN encoder can be denoted as: where the input node feature matrix E 0 is obtained from the hidden representations of [CLS] token from the last layer of a pre-trained BERT model.

Graph-Centric Contrastive Learning
In order to improve the learning capability of those two encoders without using any human-annotated labels, one prevailing way is to conduct contrastive learning from either the text perspective (Gao et al., 2021) or graph perspective (Ding et al., 2022c).However, most of the existing contrastive learning methods have the following two limitations: (1) conventional instance-level contrastive learning methods merely encourage instance-wise discrimination (Li et al., 2021b;Ding et al., 2023), which neglects the property of TAG, i.e., the relational information among text nodes.Hence, those instances that share similar semantics may be undesirably pushed away in the latent space; (2) existing methods commonly rely on arbitrary augmentation functions to generate different augmented views for applying contrastive learning, while those augmentations may unexpectedly disturb the semantic meaning of the original instance (Lee et al., 2022).
To counter the aforementioned issues, we propose a new graph-centric contrastive learning (GC-CL) algorithm, which is structure-aware and augmentation-free.GC-CL exploits inherent graph knowledge from TAG and can be applied to both the PLM encoder and GNN encoder.As suggested by the Homophily principle (McPherson et al., 2001), neighboring nodes commonly share similar semantics, meaning that their representations should also be close to each other in the latent space.Based on the PLM representation of node i, its K-hop neighboring nodes N (i), and the node i excluded mini-batch instances B(i), the GC-CL objective for PLM can be defined as follows: where τ denotes the temperature and sim(•, •) represents the cosine similarity function.Here Note that for node i, we consider its PLM representation d i as the query instance.The positive instances are the representations of node i's K-hop neighboring nodes {d p |p ∈ N (i)}.Meanwhile, the negative instances are the representations of other text nodes excluding i within the same mini-batch {d j |j ∈ B(i)}.
Similar to the PLM encoder, we also apply our GC-CL algorithm to the GNN encoder GNN(•).Specif-ically, the objective function is defined as follows: e sim(e i ,ep)/τ j∈C(i) e sim(e i ,e j )/τ , (4) where e i is the query instance.The positive instances are {e p |p ∈ N (i)} and the negative instances are {e j |j ∈ B(i)}.
Apart from the conventional instance-level contrastive learning counterparts, our graph-centric contrastive learning also enforces neighboring nodes to share similar representations.In a sense, this self-supervised learning algorithm is analogous to performing link prediction task based on the representations learned from the PLM encoder, which inherently elicits informative graph knowledge during the learning process.

Graph-Centric Knowledge Alignment
In this work, our ultimate goal is to learn expressive and generalizable representations that encode informative textual semantics within each text node as well as the relational information among nodes.However, individually conducting the graph-centric contrastive learning on either PLM or GNN is not enough due to the lack of knowledge exchange between them.To better align and enhance the knowledge captured by the PLM and GNN encoders, we propose a dual-level graph-centric knowledge alignment algorithm for TAG, which includes Node-Level Knowledge Alignment (ND-KA) and Neighborhood-Level Knowledge Alignment (NBH-KA).
Node-Level Knowledge Alignment.Different from the previously introduced graph-centric contrastive learning, which only focuses on singlemodal contrasting, ND-KA tries to align the knowledge across the two encoders by performing graphcentric contrastive learning in a cross-modal form.For each node i, based on its representations learned from the PLM encoder and GNN encoder (i.e., d i and e i , respectively), we formulate the objective of ND-KA as follows: + log e sim(d i ,ep)/τ j∈ C(i) e sim(d i ,e j )/τ /2, (5) where Note that for node i, we first consider e i that is learned from the GNN encoder as the query, then construct the positive and negative instances based on the representations learned from the PLM encoder.Specifically, the positive instances include both the representation of node i as well as the representations of i's K-hop neighboring nodes (i.e., {d p |p ∈ N (i)}), and the negative instances are the representations of other instances within the same mini-batch {d j |j ∈ B(i)}.In the meantime, we also consider the d i as the query and construct its corresponding positive and negative instances in the same way.Here we omit the illustration for simplicity.
By virtue of the proposed ND-KA algorithm, the representations of the same node learned from two separate encoders will be pulled together in the latent space.In the meantime, ND-KA also encourages neighboring nodes to have similar representations across different modalities.

Neighborhood-Level Knowledge Alignment.
To further facilitate knowledge alignment between PLM and GNN, we propose Neighborhood-Level Knowledge Alignment (NBH-KA) to align the neighborhood similarity distributions learned from the two encoders.Specifically, NBH-KA first computes the neighborhood similarity distribution between the query node i and its K-hop neighboring nodes N (i) as well as the rest nodes within the same minibatch B(i) for each encoder.Then we minimize the KL-divergence between the two distributions to align the knowledge between two encoders.The corresponding learning objective is: where P PLM (i) and P GNN (i) are the neighborhood similarity distributions for PLM encoder and GNN encoder respectively.From a certain perspective, our NBH-KA algorithm can be considered a selfsupervised form of knowledge distillation.Specifically, NBH-KA leverages the neighborhood information as self-supervision to guide the knowledge alignment process.Moreover, we conduct two-way knowledge alignment across two encoders, which is different from original knowledge distillation.

Model Learning
In order to learn our graph-centric language model Grenade on TAG without using human-annotated labels, we jointly optimize the proposed graphcentric contrastive learning and knowledge alignment algorithms.For the sake of simplicity, we define the overall training loss as follows: Once the training is finished, we can freeze the parameters of the PLM encoder and use it to compute the representations of each text node with a forward pass.The computed representations can be further used for different downstream tasks.

Experiment
To evaluate the effectiveness of our approach Grenade, we conduct comprehensive experiments on different datasets and various downstream tasks.

Experimental Setup
Evaluation Datasets.We evaluate the generalizability of the representations computed from different methods on three Open Graph Benchmark (OGB) (Hu et al., 2020) datasets: ogbn-arxiv, ogbn-products, ogbl-citation2.These datasets are utilized to evaluate the performance of few-shot and full-shot node classification, node clustering, and link prediction tasks.It should be noted that ogbnarxiv and ogbn-products datasets are not originally designed for link prediction evaluation.Therefore, we create two link prediction tasks based on these two datasets, respectively.Furthermore, we incorporate obgl-citation2 into our node classification experiment.The statistical information of the datasets is shown in Tab. 1.More comprehensive information regarding the dataset extension can be found in Appendix A. Implementation Details.To ensure a fair comparison, we implemented all baseline methods and Grenade using the same language model, specifically bert-base-uncased.For our proposed method, Grenade, we set the K-hop neighbor as 1, set the temperature parameter τ to 0.05 in all the loss functions.The optimal hyperparameter |N (i)| is discussed in § 4.5.Please refer to Appendix B for additional implementation details.

Experimental Results
Few-shot Node Classification.To assess the generalizability of learned representation to new tasks under low-data scenarios, we conduct experiments on few-shot node classification.Under this setting, the classification models are trained with varying numbers of labeled instances per class (k = {2, 4, 8, 16}).We repeat the experiment 10 times and reported the average results along with the standard deviation.The classification models utilized in this evaluation are the multilayer perceptron (MLP) and GraphSAGE (Hamilton et al., 2017).The hyperparameters for the classification models can be found in Appendix B. As the result showed in Tab. 2, several observations can be made: (1) In most cases, SSL based methods achieve better performance than non-SSL methods (BERT+MLM,SPECTER and GIANT > GLEM), this indicates the significance of SSL in enhancing model transferability to new tasks with limited labels.
(2) Among state-of-the-art TAG representation models, Grenade achieves the best performance on these datasets.This indicates the superior generalization ability of representations extracted by Grenade.The designed knowledge alignment allows the Grenade to integrate the pretrained knowledge from PLM encoder and structure inductive bias learned by GNN encoder.These expressive representations can be easily and efficiently generalized to few-shot learning tasks.
Full Data Node Classification.We also conduct the node classification experiment with full training dataset under MLP, GraphSAGE (Hamilton et al., 2017) and RevGAT-KD (Li et al., 2021a).As the result shown in Tab. 4, we can observe that: (1) Grenade achieves the best performance across all the baseline methods.
(2) The performance gap between Grenade and some baseline methods like GIANT and GLEM becomes smaller as more labeled data provided, but Grenade is consistently better than these methods.
Node Clustering.In the node clustering task, we utilize the learned text node representations to train a K-means++ model for clustering instances.We apply the default hyperparameters of K-means++ as provided by scikit-learn (Pedregosa et al., 2011).
The number of clusters is set to the number of classes in the dataset, and we assign the cluster label Table 4: Supervised node classification performance comparison on benchmark datasets.Boldfaced numbers indicate the best performance of downstream models.The ⋆ represents the experiment results adopted from (Chien et al., 2021), while † denotes the experiment results adopted from (Zhao et al., 2022).
on the most common label within each cluster.Following the evaluation protocol described in (Ding et al., 2022b), we report three clustering evaluation metrics: accuracy (ACC), normalized mutual information (NMI), and adjusted rand index (ARI).We exclude the GLEM model from this evaluation since it requires the labels during representation learning.To ensure robustness, we perform 10 runs of K-means++ with different random seeds and report the average results.As shown in Table 3, we observe that the structure augmented SSL methods outperform the text-only self-supervised representation learning methods (Grenade, GIANT, SPECTER > BERT+MLM, SimCSE).This indicates structure-augmented SSL methods can understand the context within graph structure that can lead to more accurate node representations, which in turn can lead to better clustering.Additionally, our proposed method Grenade consistently outperforms all baseline methods.The improvement demonstrates that Grenade can better preserve neighborhood information which will inform the clustering methods of how data points are interconnected or related to each other.Link Prediction.Next, we evaluate the learned representation in predicting missing connections given existing connections from TAG.We aim to rank the positive candidates (1 or 2 positive instances) higher than the negative candidates (1,000 negative instances) for each query node.The eval-uation metric used for this task is the mean reciprocal rank (MRR), which measures the reciprocal rank of the positive instance among the negative instances for each query instance and takes the average over all query instances.As shown in Fig. 2, we observe that Grenade significantly outperforms other approaches.In fact, Grenade achieves at least a 4% performance improvement compared to methods that utilize structure-augmented selfsupervised learning loss (SPECTER and GIANT) across all datasets.This demonstrates that Grenade can better preserve the neighborhood information, which is consistent with the findings from § 4.2.

Representation Visualization
To visually demonstrate the quality of the learned representations, we apply t-distributed stochastic neighbor embedding (t-SNE) (Van der Maaten and Hinton, 2008) to for representation visualization.We compare Grenade with two best-performing baseline methods, including SPECTER and GIANT on the arxiv dataset.In Fig. 3, we present the t-SNE visualization of the embeddings for 10 randomly sampled classes comprising 5,000 subsampled instances.The colors in the visualization correspond to the labels of these subsampled instances.From Fig. 3, we observe Grenade exhibits denser clusters and more explicit boundaries among different classes compared to the baseline methods.This observation confirms that Grenade can learn compact intra-class and distinct inter-class representations.

Ablation Study
To validate the effectiveness of graph-centric contrastive learning and graph-centric knowledge alignment, we conducted an ablation study on Grenade.
In this study, we respectively remove GC-CL, ND-KA, and NBH-KA from the full model and report these model variants' performance in Tab. 5.In general, the full model Grenade has the best performance in most cases, and we notice a performance decline when any of the components is removed or replaced, underscoring the significance of each component in Grenade.Remarkably, we observe a performance improvement in link prediction after removing graph-centric contrastive learning (w/o GC-CL > Grenade in terms of MRR).Considering the task similarity between GC-CL and the link prediction, one possible explanation is that removing GC-CL could help the model mitigate overfitting and further improve performance for the link prediction task.Meanwhile, this observation, in turn shows that the dual-level graph-centric knowledge alignment (ND-KA and NBH-KA) is effective for capturing structural context information from the TAG.Table 5: Ablation study of graph-centric contrastive learning (GC-CL) and knowledge-alignment on ogbnarxiv datasets."w/o" is the abbreviation of "without".

Hyperparameter Analysis
K-hop Neighbors.We delved into understanding the impact of the K-hop neighbor selection on Grenade's efficiency.The choice of different K values directly affects the formulation of positive pairs in graph-centric contrastive learning (Eq. 3 and Eq. 4), and the alignment of knowledge between the graph neural network and the language model (Eq. 5 and Eq. 6).Based on the results presented in Fig. 4, it is evident that augmenting the hop distance adversely affects performance metrics in full data node classification (ACC of MLP), node clustering (ACC), and link prediction (MRR).This suggests that 1-hop neighbors optimally capture structural knowledge within our algorithm.However, when extending to 2-hop or 3-hop neighbors, there's a heightened risk of integrating noisy data.This insight aligns with the conclusions drawn from related research, specifically SPECTER (Cohan et al., 2020).
We contend that our methodology strikes a harmonious balance between assimilating structural data and filtering out extraneous noise, thereby ensuring consistent performance in our assessments.1-Hop Neighbor Size.One crucial aspect of Grenade's SSL objectives is the hyperparameter , which controls the number of 1-hop neighbors considered for representation learning.To investigate the impact of subsampled neighbor size in Grenade, we conduct a hyperparameter analysis on the full training dataset node classification, node clustering, and link prediction tasks.As shown in Fig. 5, we observe that Grenade achieves its best performance with a practical number of neighbors (|N (i)| = 2 for ogbn-arxiv and |N (i)| = 1 for ogbn-products).This finding is particularly advantageous as it reduces the computational burden of the PLM encoder in graph-centric contrastive learning and knowledge alignment between the PLM and GNN encoders.

Related Work
Learning with TAG.This problem involves learning text node representations that encode both textual semantics and structural context information.
Contrastive Learning.Contrastive learning is a self-supervised learning paradigm that aims to learn representations by distinguishing between positive and negative instances (Jaiswal et al., 2020).A key practice in contrastive learning is to use augmented versions of the same instance as positive instances and other instances as negative instances (Gao et al., 2021;He et al.;Radford et al., 2021).For example, SimCSE creates augmented views for each instance based on dropout (Srivastava et al., 2014).However, conventional instance-level contrastive learning only encourages instance-wise discrimination (Li et al., 2021b) and commonly assumes different instances are i.i.d., which neglects the relationships among different instances on TAG.Hence, conventional contrastive learning methods are ineffective to learn expressive representation learning on TAG.To address those limitations, many recent methods seek to extend the design of positive pair construction by considering local neighborhood information (Cohan et al., 2020;Ostendorff et al., 2022).However, those methods cannot fully capture complex graph structures.In contrast, our proposed method, Grenade, leverages graph-centric contrastive learning and graph-centric knowledge alignment to fully exploit the structural context information from TAG.

Conclusion
In this paper, we introduce a self-supervised graphcentric language model: Grenade, for learning expressive and generalized representation from textual attributed graphs (TAG).Grenade is learned through two self-supervised learning algorithms: (1) Graph-Centric Contrastive Learning, which enables Grenade to harness intrinsic graph knowledge through relational-aware and augmentationfree contrastive learning; and (2) Graph-Centric Knowledge Alignment, which facilitates the exchange and strengthening of knowledge derived from the pre-trained language model encoder and the graph neural network encoder, thereby enhancing the capture of relational information from TAG.We conduct experiments on four benchmark datasets under few-shot and full data node classification, node clustering, and link prediction tasks, and find that Grenade significantly and consistently outperforms baseline methods.

Limitations
In this section, we acknowledge the following constraints in our study: (1) Constraints on the Choice of Backbone Model.Our choice of the backbone model was restricted to the initialization of "bert-base-uncased" in training Grenade.This choice was necessitated by the limitations in computational resources available to us.Exploration with alternative PLM backbones such as GPT2 (Radford et al., 2019) and RoBERTa (Liu et al., 2019) has not been carried out and represents a promising direction for subsequent studies.
(2) Comparison with Large Language Models.The natural language processing domain has recently seen breakthroughs with state-of-the-art large language models like LLaMA(Touvron et al., 2023) and Chat-GPT (OpenAI, 2023), which have demonstrated exceptional performance in language understanding tasks.Our experiment confirms that Grenade surpasses existing representation learning techniques in TAG, but the performance of Grenade relative to these cutting-edge language models remains to be ascertained.
(3) Breadth of Evaluation.In this work, we evaluate Grenade primarily through node classification, node clustering, and link prediction tasks.However, there are other relevant evaluation dimensions, such as retrieval, reranking, co-view and others (Cohan et al., 2020).Future work will investigate the applicability and capacity of Grenade to broader tasks.

A Dataset Details
We extend ogbn-arxiv and ogbn-products for link prediction and we evaluate models on the test-split from these two node datasets.Specifically, for each source node i, we randomly choose its one neighbor as the positive candidate and 1,000 negative candidates and would like the model to rank the positive candidate over the negative candidates.The negative references are randomly-sampled from all the nodes from TAG that are not connected by i.
For ogbl-citation2, we also extend it for node classification.Here the task is to predict the subject areas of the subset of the nodes/papers that published in arxiv like ogbn-papers100M.We borrow the labels for ogbl-citation2 from ogbn-papers100M.We align the nodes from ogbl-citation2 and ogbn-papers100M through Microsoft academic graph paper ID.We split the data into training/validation/test by year that the papers published before 2017 is the training dataset, between 2017-2018 is the validation dataset and after 2018 is the test dataset.

B Implementation Details
Our proposed methodology was implemented using PyTorch version 1.13.1 (Paszke et al., 2019) and HuggingFace Transformers version 4.24.0 (Wolf et al., 2019).The experiments were executed on A5000, A6000, and RTX 4090 GPUs.For Grenade, the learning rate is configured at 5e − 5, and AdamW optimizer is employed for training over the course of 3 epochs.The search space for the number of GNN layers, L, ranges from {1, 2, 3, 4}, and further hyperparameter analysis is performed as detailed in Fig. 6.The hyperparameters for fewshot node classification are shown in Tab.6 and Tab. 7, respectively.It should be noticed that "−1" means utilize all the training data for batch_size and all the neighbors for neighbor sampling.

C Additional Experimental Results
Full Data Node Classification.

Additional Ablation Study.
We have the ablation study on ogbn-arxiv and ogbn-products dataset under 7 different model variations.In the second row of Tab. 9, the term ICL refers to instance-wise crossmodality contrastive learning.It indicates that the positive pairs are formed between identically indexed document and node representations, while the negative pairs consist of other document and node representations within the minibatch.From Table 9, have the consistent observation as 5 that each component of Grenade contributes the performance improvement.
In-Depth Hyperparameter Analysis.To evaluate the performance of the graph neural network encoder (GNN), we conduct an analysis of the hyperparameter L as depicted in Fig. 6.Our observations indicate that an optimal performance is achieved when L = 2.
Inference Time Complexity Analysis When provided with an arbitrary node, Grenade is capable of generating the textual representation of the node without requiring graph information.This leads

Figure 1 :
Figure 1: Illustration of Grenade.Given a text-attributed graph (TAG), Grenade is jointly optimized by two graphcentric self-supervised learning algorithms: graph-centric contrastive learning GC-CL and a dual-level graph-centric knowledge alignment, which comprises node-level alignment (ND-KA) and neighborhood-level alignment (NBH-KA).

Figure 4 :
Figure 4: Hyperparameter evaluation for K-hop neighbors in the ogbn-arxiv dataset.Dashed lines indicate the peak performance of Grenade with K = 1.

Table 1 :
(Zhao et al., 2022))ion of the datasets.tokenfromfrozenbert-base-uncasedas the text node representation.As for OGB, we utilize the default features from benchmark datasets, such as averaged word embeddings and bag-of-words representations.This category includes SPECTER (Cohan et al., 2020), GIANT(Chien et al., 2021)and GLEM(Zhao et al., 2022).SPECTER applies graphcentric contrastive learning on the language model, and GIANT employs the extreme multi-label classification to train the language model for neighborhood prediction.It is noteworthy that GLEM utilizes taskspecific labels to alternatively guide the pre-trained language model PLM and graph neural networks GNN through self-knowledge distillation.Compared with GLEM, our proposed method Grenade is fully self-supervised and does not rely on any human-annotated labels.The learned text node representations can be efficiently and effectively generalized to downstream tasks.

Table 2 :
Experiment results of few-shot node classification.
⋆indicates that the text node representations are obtained from their official release.− indicates no result for GLEM.This is because in the representation learning stage, GLEM will utilize the labeled dataset to train GNN.

Table 3 :
Experiment results of node clustering.

Table 6 :
Hyperparameter setting for MLP model in node classification.

Table 7 :
(Chen et al., 2023)ing for GraphSAGE model in node classification.Besides MLP and GraphSage, we incorporated the recent Graph Transformer Network, NAGphormer(Chen et al., 2023), into our node classification evaluation.As evidenced by the data in Tab. 8, our proposed approach consistently surpasses the baseline techniques (BERT+MLM, SPECTER, GIANT) in the few-shot and full-data node classification.This performance is consistent with the observations of using MLP and GraphSage.