Multi-Modal Knowledge Graph Transformer Framework for Multi-Modal Entity Alignment

Multi-Modal Entity Alignment (MMEA) is a critical task that aims to identify equivalent entity pairs across multi-modal knowledge graphs (MMKGs). However, this task faces challenges due to the presence of different types of information, including neighboring entities, multi-modal attributes, and entity types. Directly incorporating the above information (e.g., concatenation or attention) can lead to an unaligned information space. To address these challenges, we propose a novel MMEA transformer, called MoAlign, that hierarchically introduces neighbor features, multi-modal attributes, and entity types to enhance the alignment task. Taking advantage of the transformer's ability to better integrate multiple information, we design a hierarchical modifiable self-attention block in a transformer encoder to preserve the unique semantics of different information. Furthermore, we design two entity-type prefix injection methods to integrate entity-type information using type prefixes, which help to restrict the global information of entities not present in the MMKGs. Our extensive experiments on benchmark datasets demonstrate that our approach outperforms strong competitors and achieves excellent entity alignment performance.


Introduction
Multi-modal entity alignment (MMEA) is a challenging task that aims to identify equivalent entity pairs across multiple knowledge graphs that feature different modalities of attributes, such as text and images.To accomplish this task, sophisticated models are required to effectively leverage information from different modalities and accurately align entities.This task is essential for various applications, such as cross-lingual information retrieval, question answering (Antol et al., 2015;Shih et al., 2016), and recommendation systems (Sun et al., 2020;Xu et al., 2021).* Corresponding author.MMEA (Liu et al., 2019;Li et al., 2023b;Liu et al., 2021;Lin et al., 2022) is challenging due to the heterogeneity of MMKGs (e.g., different neighbors, multi-modal attributes, distinct types), which makes it difficult to learn rich knowledge representations.Previous approaches such as PoE (Liu et al., 2019) concatenated all modality features to create composite entity representations but failed to capture interactions among heterogeneous modalities.More recent works (Chen et al., 2020;Guo et al., 2021) designed multi-modal fusion modules to better integrate attributes and entities, but still did not fully exploit the potential interactions among modalities.These methods also ignored inter-modality dependencies between entity pairs, which could lead to incorrect alignment.Generally speaking, although MMKGs offer rich attributes and neighboring entities that could be useful for multi-mdoal entity alignment, current methods have limitations in (i) ignoring the differentiation and personalization of the aggregation of heterogeneous neighbors and modalities leading to the misalignment of cross-modal semantics, and (ii) lacking the use of entity heterogeneity resulting in the non-discriminative representations of different meaning/types of entities.
Therefore, the major challenge of MMEA task is how to perform differentiated and personalized aggregation of heterogeneous information of the neighbors, modalities, and types.Although such information is beneficial to entity alignment, directly fusing will lead to misalignment of the information space, as illustrated in Figure 1.Firstly, notable disparities between different modalities make direct alignment a challenging task.For example, both the visual attribute of entity Ruby in MMKG1 and the neighbor information of the entity Ruby in MMKG2 contain similar semantics of programming, but data heterogeneity may impede effective utilization of this information.Secondly, complex relationships between entities require a thorough understanding and modeling of contextual information and semantic associations.Entities such as the Ruby, the Perl, and the entity Larry Wall possess unique attributes, and their inter-relationships are non-trivial, necessitating accurate modeling based on contextual information and semantic associations.Furthermore, the existence of multiple meanings for entities further exacerbates the challenge of distinguishing between two entities, such as in the case of the Ruby, which has different meanings in the MMKG1 and MMKG3 where it may be categorized as a jewelry entity or a programming language entity, respectively.
To overcome the aforementioned challenges, we propose a novel Multi-Modal Entity Alignment Transformer named MoAlign1 .Our framework hierarchically introduces neighbor, multimodal attribute, and entity types to enhance the alignment task.We leverage the transformer architecture, which is known for its ability to process heterogeneous data, to handle this complex task.Moreover, to enable targeted learning on different modalities, we design a hierarchical modifiable self-attention block in the Transformer encoder, which builds associations of task-related intra-modal features through the layered introduction.Additionally, we introduce positional encoding to model entity representation from both structure and semantics simultaneously.Furthermore, we integrate entitytype information using an entity-type prefix, which helps to restrict the global information of entities that are not present in the multi-modal knowledge graphs.This prefix enables better filtering out of unsuitable candidates and further enriches entity representations.To comprehensively evaluate the effectiveness of our proposed approach, we design training objectives for both entity and context evaluation.Our extensive experiments on benchmark datasets demonstrate that our approach outperforms strong competitors and achieves excellent entity alignment performance.Our contributions can be summarized as follows.
• We propose a novel MMEA framework named MoAlign, which effectively integrates heterogeneous information through the multi-modal KG Transformer.
• We design a hierarchical modifiable selfattention block to build associations of taskrelated intra-modal features through the layered introduction and design an entity-type prefix to further enrich entity representations.
• Experimental results indicate that the framework achieves state-of-the-art performance on the public multi-modal entity alignment datasets.

Preliminaries
Multi-Modal Knowledge Graph Multi-Modal Entity Alignment Task.Multimodal entity alignment (Chen et al., 2020;Guo et al., 2021;Liu et al., 2021;Chen et al., 2022) aims to determine if two entities from different multimodal knowledge graphs refer to the same realworld entity.This involves calculating the similarity between pairs of entities, known as alignment seeds.The goal is to learn entity representations from two multi-modal knowledge graphs (M KG 1 and M KG 2 ) and calculate the similarity between a pair of entity alignment seeds (e, e ′ ) taken from these KGs.The set of entity alignment seeds is denoted as S = {(e, e ′ ) | e ∈ E, e ′ ∈ E ′ , e ≡ e ′ }.

Framework
This section introduces our proposed framework MoAlign.As shown in Figure 2, we introduce positional encoding to simultaneously model entity representation from both modality and structure.To hierarchically introduce neighbor and multi-modal attributes, we design a hierarchical modifiable selfattention block.This block builds associations of task-related intra-modal features through the layered introduction.Furthermore, for integrating entity-type information, we design a prefix-injected self-attention mechanism, which helps to restrict the global information of entities not present in the MMKGs.Additionally, MoAlign also design training objectives for both entity and context evaluation to comprehensively assess the effectiveness.

Multi-Modality Initialization
The textual attribute is initialized by BERT (Devlin et al., 2019).For the visual attributes, to enable direct processing of images by a standard transformer, the image is split into a sequence of patches (Dosovitskiy et al., 2021).We then perform a flatten operation to convert the matrix into a one-dimensional vector similar to word embeddings, which is similar to embeddings in BERT (Devlin et al., 2019) and concatenate them to form the image embedding vector, denoted as v v .However, the initialization of word embeddings follows a specific distribution, whereas the distribution of image embeddings generated using this method is not consistent with the pretraining model's initial distribution.Therefore, to standardize the distribution of image embeddings to be consistent with the distribution of the vocabulary used by the pretraining model, we perform the following operation: where mean(v v ) and std(v v ) are the mean and standard deviation of v v , respectively, and λ is the standard deviation of truncated normal function.

Positional Encoding
For the transformer input, we additionally input two types of positional encoding to maintain structure information of multi-modal KG as follows.
Modality Positional Encoding.To enable the model to effectively distinguish between entities, textual attributes, image attributes, and introduced entity types, we incorporate a unique position code for each modality.These position codes are then passed through the encoding layers of the model to enable it to differentiate between different modalities more accurately and learn their respective features more effectively.Specifically, we assign position codes of 1, 2, 3, and 4 to entities, textual attributes, image attributes, and introduced entity types, respectively.By incorporating this modality positional encoding information into the encoding process, our proposed model is able to incorporate modality-specific features into the alignment task.
Structure Positional Encoding.To capture the positional information of neighbor nodes, we introduce a structure positional encoding that assigns a unique position code to each neighbor.This allows the model to distinguish between them and effectively incorporate the structure information of the knowledge graph into alignment process.Specifically, we assign a position code of 1 to the current entity and its attributes.For first-order neighbors, we randomly initialize a reference order and use it to assign position codes to each neighbor and its corresponding relation as 2n and 2n + 1, respectively.Additionally, we assign the same structure positional encoding of attributes to their corresponding entities.By doing so, the model can differentiate between different neighbor nodes and effectively capture their positional information.
To fully utilize the available information, we extract relation triplets and multi-modal attribute triplets for each entity.The entity is formed by combining multi-modal sequences as {e,(e 1 ,r 1 ),. . .,(e n , r n ),(a 1 ,v 1 ),. . .,(a m , v m ), e T }, where (e i , r i ) represents the i-th neighbor of entity e and its relation.(a j , v j ) represents the j-th attribute of entity e and its value v j , and it contains textual and visual attributes.n and m are the numbers of neighbors and attributes.e T is the type embeddding.

Hierarchical Modifiable Self-Attention
To better capture the dependencies and relationships of entities and multi-modal attributes, we incorporate cross-modal alignment knowledge into the transformer.Specifically, a hierarchical modifiable self-attention block is proposed to better capture complex interactions between modalities and contextual information.The block focuses on associations of inter-modal (between modalities, such as textual and visual features) by cross-modal attention.By doing so, the model can capture more informative and discriminative features.

Hierarchical Multi-Head Attention
We propose a novel hierarchical block that incorporates distinct attention mechanisms for different inputs, which facilitates selective attention toward various modalities based on their significance in the alignment task.The transformer encoder comprises three separate attention mechanisms, namely neighbor attention, textual attention, and visual attention.We utilize different sets of queries, keys, and values to learn diverse types of correlations between the input entity, its attributes, and neighbors.
Neighbor Multi-Head Attention.More specifically, in the first layer of the hierarchical block of the l-th transformer layer, we employ the input entity as queries and its neighbors as keys and values to learn relations between the entity and its neighbors: where e i is the neighbor of entity e and e (l.1) is the representation of entity e in the first layer.
Textual Multi-Head Attention.In the second layer of the hierarchical block, the input entity and its textual attributes are used as queries and keys/values to learn the correlations between the entity and its textual attributes.
where (a t , v t ) represents the textual attribute of entity e and its value, and e (l.2) is the representation of entity e in the second layer.
Visual Multi-Head Attention.In the third layer of the hierarchical block, the input entity and its visual attributes are used similarly to the second layer, to learn the correlations between the entity and its visual attributes.
where (a v , v v ) represents the visual attribute of entity e and its value, and e (l.3) is the representation of entity e in the third layer.By incorporating neighbor attention, textual attribute attention, and visual attribute attention, our model can capture various correlations between the input entity, its attributes, and neighbors in a more effective manner.

Modifiable Self-Attention
To learn a specific attention matrix for building correlations among entities and attributes, we design a modifiable self-attention mechanism.We manage to automatically and adaptively generate an information fusion mask matrix based on sequence features.The length of the sequence and the types of information it contains may vary depending on the entity, as some entities may lack certain attribute information and images.
Modifiable Self-Attention Mechanism.To adapt to the characteristics of different sequences, it is necessary to assign "labels" to various information when generating the sequence, such as using [E] to represent entities and [R] to represent relations.This way, the positions of various information in the sequence can be generated, and the mask matrix can be generated accordingly.These labels need to be processed by the model's tokenizer after inputting the model and generating the mask matrix to avoid affecting the subsequent generation of embeddings.We can still modify the vocabulary to allow the tokenizer to recognize these words and remove them.The mask matrix can be generated based on the positions of various information, represented by labels, as follows: where T (•) is the type mapping function of entities and attributes.Subsequently, residual connections and layer normalization are utilized to merge the output of hierarchical modifiable self-attention e l and apply a position-wise feed-forward network to each element in the output sequence of the self-attention.The output sequence, residual connections, and layer normalization are employed as: e (l) = LayerNorm e (l−1) + FFN(e (l) ) , ( 6) where e l is the output of the hierarchical modifiable self-attention in the l-th layer of the transformer.The use of residual connections and layer normalization helps stabilize training and improve performance.It enables our model to better understand and attend to different modalities , leading to more accurate alignment results.

Entity-Type Prefix Injection
Prefix methods can help improve alignment accuracy by providing targeted prompts to the model to improve generalization, including entity type information (Liu et al., 2023).The entity-type injection introduces a type of information that takes as input a pair of aligned seed entities from different MMKGs and uses entities as prompts.The goal is to inject entity-type information into the multi-head attention and feed-forward neural networks.
Prefix-Injected Self-Attention Mechanism.Specifically, we create two sets of prefix vectors for entities, p k , p v ∈ R nt×d , for keys and values separately, where n t is the number of entity types.These prefix vectors are then concatenated with the original key K and value V .The multi-head attention in Eq.( 2) is performed on the newly formed prefixed keys and values as follows: where d N is typically set to d/N , and N is number of head.
In this paper, we use the type embedding e T as the prefix.
In order to effectively incorporate alignment information, we utilize the mask token to influence attention weights.The mask token serves as a means to capture effective neighbors and reduce computational complexity.To achieve this, we set attention weights m to a large negative value based on the mask matrix M ij in the modifiable self-attention mechanism.Consequently, attention weights for the mask token are modified as follows: where Q, K t , and V t represent the query of the entity, the key containing entity type, and the value containing entity type matrices, respectively.d k refers to the dimensionality of key vectors, while m represents the attention weights.
Prefix-Injected Feed Forward Network.Recent research suggests that the feed-forward layers within the transformer architecture store factual knowledge and can be regarded as unnormalized key-value memories (Yao et al., 2022).Inspired by this, we attempt to inject entity type into each feed-forward layer to enhance the model's ability to capture the specific information related to the entity type.Similar to prefix injection in attention, we first repeat the entity type e T to create two sets of vectors, Φ k , Φ v ∈ R nt×d .These vectors are then concatenated with the original parameter matrices of the first and second linear layers.
) where f denotes the non-linearity function, E represents the output of the hierarchical modifiable self-attention.W k f , W v f are the parameters.991

Training Objective
In order to effectively evaluate the MMEA from multiple perspectives, we propose a training objective function that incorporates both aligned entity similarity and context similarity.This objective function serves to assess the quality of entity alignment and takes into consideration both the entities themselves and their surrounding context.
Aligned Entity Similarity.The aligned entity similarity constraint loss is designed to measure similarity between aligned entity pairs and ensure that embeddings of these entities are close to each other in the embedding space.
L EA = sim(e, e ′ )−sim(e, e ′ )−sim(e, e ′ ), (10) where (e, e ′ ) represent the final embeddings of the aligned seed entities (e, e ′ ) from knowledge graphs KG1 and KG2. e and e ′ denote the negative samples of the seed entities.sim(•, •) refers to the cosine distance between the embeddings.
Context Similarity.The context similarity loss is employed to ensure that the context representations of aligned entities are close to each other in the embedding space.The total alignment loss L is computed as: where α, β are learnable hyper-parameters.

Dataset
We have conducted experiments on two of the most popular datasets, namely FB15K-DB15K and FB15K-YAGO15K, as described in (Liu et al., 2019).The FB15K-DB15K dataset2 is an entity alignment dataset of FB15K and DB15K MMKGs, while the latter is a dataset of FB15K and YAGO15K MMKGs.Consistent with prior works (Chen et al., 2020;Guo et al., 2021), we split each dataset into training and testing sets in proportions of 2:8, 5:5, and 8:2, respectively.The MRR, Hits@1, and Hits@10 are reported for evaluation on different proportions of alignment seeds.

Comparision Methods
We compare our method with three EA baselines (TransE (Bordes et al., 2013)

Implementation Details
We utilized the optimal hyperparameters reported in the literature for all baseline models.Our model was implemented using PyTorch, an opensource deep learning framework.We initialized text and image attributes using bert-base-uncased3 and VGG164 , respectively.To ensure fairness, all baselines were trained on the same data set partition.The best random dropping rate is 0.35, and coefficients α, β were set to 5 and 2, respectively.All hyperparameters were tuned on validation data using 5 trials.All experiments were performed on a server with one GPU (Tesla V100).

Main Results
To verify the effectiveness of MoAlign, we report overall average results in Table 1.2) Compared to EA baselines (1-3), especially for MRR and Hits@1, our model improves 5% and 9% up on average on FB15K-DB15K and FB15K-YAGO15K, tending to achieve more significant improvements.It demonstrates that the effective-FB15K-DB15K (20%) FB15K-DB15K (50%) FB15K-DB15K (80%) Methods MRR Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR Hits@1 Hits@10 ness of multi-modal context information benefits incorporating alignment knowledge.3) Compared to multi-modal transformer-based baselines, our model achieves better results and the transformerbased baselines perform better than EA baselines.It demonstrates that transformer-based structures can learn better MMEA knowledge.4) Compared to MMEA baselines, our model designs a Hierarchical Block and modifiable self-attention mechanism, the average gains of our model regarding MRR, Hits@1, and Hits@10 are 2%, 1.4%, and 1.7%, respectively.The reason is that our method incorporates multi-modal attributes and robust context entity information.All the observations demonstrate the effectiveness of MoAlign.

Discussions for Model Variants
To investigate the effectiveness of each module in MoAlign, we conduct variant experiments, showcasing the results in Table 2.The "↓" means the value of performance degradation compared to the MoAlign.We can observe that: 1) The impact of the Hierarchical Block tends to be more significant.
We believe the reason is that the adaptive introduction of multi-modal attributes and neighbors captures more clues.2) By replacing the modifiable self-attention to the multi-head self-attention, the performance decreased significantly.It demonstrates that the modifiable self-attention captures more effective multi-modal attribute and relation information.FB15K-DB15K (80%) Variants MRR Hits@1 Hits@10 △ Avg 67.7 85.9 ↓2.0 MoAlign (T → V → N) 76.0 66.9 86.1 ↓2.1 Table 3: Order Impact on FB15K-DB15K (80%).
tributes as "w/o image attribute", our method drops 2.7% and 2.7% on average on FB15K-DB15K and FB15K-YAGO15K.It demonstrates that image attributes can improve model performance and our method utilizes image attributes effectively by capturing more alignment knowledge.

Impact of Multi-modal Attributes
To further investigate the impact of multi-modal attributes on all compared methods, we report the results by deleting different modalities of attributes, as shown in Figure 3. From the figure, we can observe that: 1) The variants without the text or image attributes significantly decline on all evaluation metrics, which demonstrates that the multi-modal attributes are necessary and effective for MMEA.2) Compared to other baselines, our model derives better results both in the case of using all multi-modal attributes or abandoning some of them tual attributes, and visual attributes.As shown in the Table 3, the introduction order has a significant impact on our model.Specifically, the best performance is achieved when textual attributes, visual attributes, and neighbor information are introduced in that order.This suggests that aggregating attribute information to learn a good entity representation first, and then incorporating neighbor information, can effectively utilize both the attributes and neighborhood information of nodes.

Impact of Interference Data
We investigate the impact of interference data on our proposed method for MMEA.Specifically, we randomly replace 5%, 10%, 15%, 20%, 25% and 30% of the neighbor entities and attribute information to test the robustness of our method, as shown in Figure 4.The experimental results demonstrate that our method exhibits better tolerance to interference compared to the baseline methods.This suggests that our approach, which incorporates hierarchical information from different modalities and introduces type prefix to entity representations, is capable of handling interference data and improving the robustness of the model.

Conclusion
This paper proposes a novel MMEA framework.It incorporates cross-modal alignment knowledge using a two-stage transformer encoder to better capture complex inter-modality dependencies and semantic relationships.It includes a MMKG transformer encoder that uses self-attention mechanisms to establish associations between intra-modal features relevant to the task.Our experiments show that our approach outperforms competitors.

Limitations
Our work hierarchically introduces neighbor, multimodal attribute, and entity types to enhance the alignment task.Empirical experiments demonstrate that our method effectively integrates heterogeneous information through the multi-modal KG Transformer.However, there are still some limitations of our approach can be summarized as follows: • Due to the limitation of the existing MMEA datasets, we only experiment on entity, text, and image modalities to explore the influence of multi-modal features.We will study more modalities in future work.
• Our approach employs the transformer encoder architecture, which entails a substantial time overhead.In forthcoming investigations, we intend to explore the feasibility of leveraging promptbased techniques to mitigate the computational burden and expedite model training.

Ethics Statement
In this work, we propose a new MMEA framework that hierarchically introduces neighbor, multimodal attribute, and entity types to benchmark our architecture with baseline architectures on the two MNEA datasets.
Data Bias.Our framework is tailored for multimodal entity alignment in the general domain.
Nonetheless, its efficacy may be compromised when confronted with datasets exhibiting dissimilar distributions or in novel domains, potentially leading to biased outcomes.The experimental results presented in the section are predicated on particular benchmark datasets, which are susceptible to such biases.As a result, it is imperative to exercise caution when assessing the model's generalizability and fairness.
Computing Cost/Emission.Our study, which involves the utilization of large-scale language models, entails a substantial computational overhead.We acknowledge that this computational burden has a detrimental environmental impact in terms of carbon emissions.Specifically, our research necessitated a cumulative 588 GPU hours of computation utilizing Tesla V100 GPUs.The total carbon footprint resulting from this computational process is estimated to be 65.27 kg of CO 2 per run, with a total of two runs being conducted.

A Related Work
A.1 Multi-Modal Entity Alignment In the real-world, due to the multi-modal nature of KGs, there have been several works (Zhu et al., 2022;Wang et al., 2021;Jiang et al., 2021;Fang et al., 2022) that have started to focus on MMEA technology.One popular approach is to use embeddings to represent entities and their associated modalities (Cao et al., 2022;Yang et al., 2022).These embeddings can then be used to measure the similarity between entities and align them across different KGs.It developed a hyperbolic multi-modal entity alignment (HEA) approach that combines both attribute and entity representations in the hyperbolic space and uses aggregated embeddings to predict alignments.EVA (Liu et al., 2021) combines images, structures, relations, and attributes information for the MMEA with a learnable weighted attention to learn the importance of each modal attributes.Despite these advances, existing methods often ignore contextual gaps between entity pairs, which may limit the effectiveness of alignment.

A.2 Knowledge Graph Transformer
The Transformer architecture, originally proposed for natural language processing tasks, has been applied to various knowledge graph tasks as well (Liu et al., 2022;Hu et al., 2022;Howard et al., 2022;Wang et al., 2023Wang et al., , 2022;;Fang et al., 2023b,a).For example, the KGAT model (Wang et al., 2019) uses a graph attention mechanism to capture the complex relations between entities in a knowledge graph, and a Transformer to learn representations of the entities for downstream tasks such as link prediction and entity recommendation.The K-BERT model (Liu et al., 2020a) extends this approach by pre-training a Transformer on a large corpus of textual data, and then fine-tuning it on a knowledge graph to improve entity and relation extraction.The transformer model has the ability to model longrange dependencies, where entities and their associated modalities can be distant from each other in the knowledge graph.Furthermore, Transformers utilize attention mechanisms to weigh the importance of different inputs and focus on relevant information, which is particularly useful for aligning entities across different modalities.

B Comparision Methods
We compared our method with three EA baselines that aggregate text attributes and relation information, and introduced image attributes initialized by VGG16 for entity representation using the same aggregation method as for text attributes.The three EA methods compared are as follows: (1) TransE (Bordes et al., 2013) assumes that the entity embedding v should be close to the attribute embedding a plus their relation r.
(2) GCN-align (Wang et al., 2018a) transfers entities and attributes from each language to a common representation space through GCN.(3) AttrGNN (Liu et al., 2020b) divides the KG into multiple subgraphs, effectively modeling various types of attributes.We also compared our method with three transformer models that incorporate multi-modal information: (4) BERT (Devlin et al., 2019) is a pre-trained model to generate representations.( 5) ViT (Dosovitskiy et al., 2021) is a visual transformer model that partitions an image into patches and feeds them as input sequences.( 6) CLIP (Radford et al., 2021) is a joint language and vision model architecture that employs contrastive learning.
In addition, we compared our method with four MMEA methods focusing on utilizing multi-modal attributes: (7) PoE (Liu et al., 2019) utilizes image features and measures credibility by matching the semantics of entities.(8) Chen et al. (Chen et al., 2020) designs a fusion module to integrate multi-modal attributes.( 9) HEA (Guo et al., 2021) characterizes MMKG in hyperbolic space.(10)

Figure 1 :
Figure 1: Two examples of the MMEA task, where the entity pair in MMKG 1 and MMKG 2 is entity seed and the pair in MMKG 1 and MMKG 3 is not.The blue, green, and orange circles are entities, textual attributes and visual attributes.

Figure 2 :
Figure 2: The framework of the Multi-Modal KG Transformer, MoAlign.The hierarchical modifiable self-attention block learns the entity by hierarchically integrating multi-modal attributes and neighbors.The prefix-injected self-attention mechanism introduces entity type information for alignment.
Table1: Main experiments on FB15K-DB15K (top) (%) and FB15K-YAGO15K (bottom) (%) with different proportions of MMEA seeds.The best results are highlighted in bold, and underlined values are the second best result."↑" means the improvement compared to the second best result, and "-" means that results are not available.