Fully Hyperbolic Neural Networks

Hyperbolic neural networks have shown great potential for modeling complex data. However, existing hyperbolic networks are not completely hyperbolic, as they encode features in the hyperbolic space yet formalize most of their operations in the tangent space (a Euclidean subspace) at the origin of the hyperbolic model. This hybrid method greatly limits the modeling ability of networks. In this paper, we propose a fully hyperbolic framework to build hyperbolic networks based on the Lorentz model by adapting the Lorentz transformations (including boost and rotation) to formalize essential operations of neural networks. Moreover, we also prove that linear transformation in tangent spaces used by existing hyperbolic networks is a relaxation of the Lorentz rotation and does not include the boost, implicitly limiting the capabilities of existing hyperbolic networks. The experimental results on four NLP tasks show that our method has better performance for building both shallow and deep networks. Our code will be released to facilitate follow-up research.

point in a hyperbolic space, the tangent space at this point is a Euclidean subspace, all Euclidean neural operations can be easily adapted into this tangent space. Therefore, existing works [12,30] formalize most of the operations for hyperbolic neural networks in a hybrid way, by transforming features between hyperbolic spaces and tangent spaces via the logarithmic and exponential maps, and performing neural operations in tangent spaces. However, the logarithmic and exponential maps require a series of hyperbolic and inverse hyperbolic functions. The compositions of these functions are complicated and usually range to infinity, significantly weakening the stability of models.
To avoid complicated transformations between hyperbolic spaces and tangent spaces, we propose a fully hyperbolic framework by formalizing operations for neural networks directly in hyperbolic spaces rather than tangent spaces. Inspired by the special theory of relativity, which uses Minkowski space (a Lorentz model) to measure the spacetime and formalizes the linear transformations in the spacetime as the Lorentz transformations, our hyperbolic framework selects the Lorentz model as our feature space. Base on the Lorentz model, we formalize operations via the relaxation of the Lorentz transformations to build hyperbolic neural networks, including linear layer, attention layer, etc. We also prove that performing linear transformation in the tangent space at the origin of hyperbolic spaces [12,30] is equivalent to performing a Lorentz rotation with relaxed restrictions, i.e., existing hyperbolic networks do not include the Lorentz boost, implicitly limiting their modeling capabilities.
To verify our framework, we build fully hyperbolic neural networks for several representative scenarios, including knowledge graph embeddings, network embeddings, machine translation, and dependency tree probing. The experimental results show that our fully hyperbolic networks can outperform Euclidean baselines with fewer parameters. Compared with existing hyperbolic networks that rely on tangent spaces, our fully hyperbolic networks also achieve better or comparable results.

Preliminaries
Hyperbolic geometry is a non-Euclidean geometry with constant negative curvature K. There are several hyperbolic geometric models that have been applied in previous studies: the Poincaré ball (Poincaré disk) model [12], the Poincaré half-plane model [37], the Klein model [14] and the Lorentz (Hyperboloid) model [30]. All these hyperbolic models are isometrically equivalent, i.e., any point in one of these models can be transformed to a point of others with distance-preserving transformations [33]. We select the Lorentz model as the framework cornerstone, considering the numerical stability and calculation simplicity of its exponential/logarithm maps and distance function.

The Lorentz Model
Formally, an n-dimensional Lorentz model is the Riemannian manifold L n K = (L n , g K x ). K is the constant negative curvature. g K x = diag(−1, 1, · · · , 1) is the Riemannian metric tensor. Each point in L n K has the form x = [ xt xs ] , x t ∈ R, x s ∈ R n . L n is a point set satisfying L n := {x ∈ R n+1 | x, x L = 1 K , x t > 0}, and x, y L = −x t y t + x s y s = x diag(−1, 1, · · · , 1)y is the Lorentzian inner product.
As shown in Figure 1a, L n is a hyperboloid (hyper-surface) in an (n + 1)-dimensional Minkowski space with the origin ( −1/K, 0, · · · , 0). For simplicity, we denote the point in the Lorentz model as x ∈ L n K . Given x, y ∈ L n K , the distance function between them is d L (x, y) =  Figure 1a, A is mapped to B in the tangent space at the origin T 0 L n K through the logarithmic map. A Euclidean linear transformation is performed to obtain C. Finally, C is mapped back to the hyperbolic space through the exponential map. Figures 1b and 1c are the visualization of the Lorentz boost and rotation, where points on the intersection of a plane and the hyperboloid are still coplanar after the Lorentz boost. Figure 1d is pseudo-rotation in §3.1, where a point is first transformed and then projected onto the hyperboloid.
The exponential map exp K x (z) : T x L n K → L n K can map any tangent vector z ∈ T x L n K to L n K by moving along the geodesic γ satisfying γ(0) = x and γ (0) = z. More specifically, exp K

The Lorentz Transformations
In the special relativity, the Lorentz transformations are a family of linear transformations from a coordinate frame in spacetime to another frame moving at a constant velocity relative to the former. Any Lorentz transformation can be decomposed into a combination of a Lorentz boost and a Lorentz rotation by polar decomposition [27].
Definition 1 (Lorentz Boost). Lorentz boost describes relative motion with constant velocity and without rotation of the spatial coordinate axes. Given a velocity v ∈ R n (ratio to the speed of light), v < 1 and γ = 1 √ 1− v 2 , the Lorentz boost matrices are given by B = Definition 2 (Lorentz Rotation). Lorentz rotation is the rotation of the spatial coordinates. The Lorentz rotation matrices are given by R = 1 0 0R , whereR R = I and det(R) = 1, i.e., R ∈ SO(n) is a special orthogonal matrix.
Both the Lorentz boost and the Lorentz rotation are the linear transformations directly defined in the Lorentz model, i.e., ∀x ∈ L n K , Bx ∈ L n K and Rx ∈ L n K . Hence, we build fully hyperbolic neural networks on the basis of these two types of transformations in this paper.

Fully Hyperbolic Linear Layer
We first introduce our hyperbolic linear layer in the Lorentz model, considering it is the most essential block for neural networks. Although the Lorentz transformations in §2.2 are linear transformations in the Lorentz model, they cannot be directly used for neural networks. On the one hand, the Lorentz transformations transform coordinate frames without changing the number of dimensions. On the other hand, complicated requirements of the Lorentz transformations (e.g., special orthogonal matrices for the Lorentz rotation) make computation and optimization problematic.
To this end, instead of directly learning a matrix M satisfying ∀x ∈ L n , Mx ∈ L m , we re-formalize our hyperbolic linear layer to learn a matrix can map any matrices to the suitable ones for the hyperbolic linear layer. Specifically, given x ∈ L n K , f x (M) is given as Relations with the Lorentz Transformations In this part, we show in the following lemma that the set of matrices f x (M) defined as above contains all Lorentz rotation and boost matrices. Lemma 2. In the n-dimensional Lorentz model L n K , we denote the set of all Lorentz boost matrices as B , the set of all Lorentz rotation matrices as R . Given x ∈ L n K , we also denote the range of f x (M) at x without changing the number of space dimension as M According to Lemmas 1 and 2, both Lorentz boost and rotation can be covered by our linear layer.
Relations with the Linear Layer Formalized in the Tangent Space In this part, we show that the conventional hyperbolic linear layer formalized in the tangent space at the origin [12,30] can be considered as a Lorentz transformation with only a special rotation but no boost. Figure 1a visualizes the conventional hyperbolic linear layer.
As shown in Figure 1d, we consider a special setting "pseudo-rotation" of our hyperbolic linear layer. Formally, at the point x ∈ L n K , all matrices for pseudo-rotation are collected by the set P x = f x ( w 0 0 W ) w ∈ R, W ∈ R n×n . As we no longer require the submatrix W to be a special orthogonal matrix, this setting is a relaxation of the Lorentz rotation.
Formally, given x ∈ L n K , the conventional hyperbolic linear layer relies on the logarithmic map to map the point into the tangent space at the origin, a matrix to perform linear transformation in the tangent space, and the exponential map to map the final result back to L n K 3 . The whole process 4 is where To prove H x ∩ B = I is trivial, we will not elaborate here. Therefore, a conventional hyperbolic linear layer can be considered as a special rotation where the time axis is changed according to the space axes to ensure that the output is still in the Lorentz model. Our linear layer is not only fully hyperbolic but also equipped with boost operations to be more expressive. Moreover, without using the complicated logarithmic and exponential maps, our linear layer has better efficiency and stability.
Here, we give a more general formula of our hyperbolic linear layer based on f x ( v W )x, by adding activation, dropout, bias and normalization, , and φ is an operation function: for the dropout, the function is φ(Wx, v) = Wdropout(x); for the activation and normalization φ( where σ is the sigmoid function, b and b are bias terms, λ > 0 controls the scaling range, h is the activation function. We elaborate φ(·) we use in practice in the appendix.

Fully Hyperbolic Attention Layer
Attention layers are also important for building networks, especially for the widely-used NLP networks Transformers [40]. We propose an attention mechanism in the Lorentz model. Specifically, we consider the weighted aggregation of a point set P = {x 1 , . . . , x |P| } as calculating the centroid µ of P, whose expected (squared) distance to P is minimum, i.e. min µ∈L n where ν i is the weight of the i-th point. Law et al. [22] prove that, with squared Lorentzian disntace defined as d 2 L (a, b) = 2/K − 2 a, b L , the centroid w.r.t. the squared Lorentzian distance is given as Given the query set Q = {q 1 , . . . , q |Q| }, key set K = {k 1 , . . . , k |K| }, and value set V = {v 1 , . . . , v |V| }, where |K| = |V|, we exploit the squared Lorentzian distance between points to calculate weights and the attention is defined as ATT(Q, K, V) = {µ 1 , . . . , µ |Q| }, and calculated as: Furthermore, multi-headed attention is defined as MHATT(Q, K, V) = {µ 1 , . . . , µ |Q| }, and µ i is where H is the head number, [·| . . . |·] is the concatenation of multiple vectors, ATT i (·, ·, ·) is the i-th head attention, and HL i Q (·), HL i K (·), HL i V (·) are the hyperbolic linear layers of the i-th head attention.

Fully Hyperbolic Residual Layer and Position Encoding Layer
Lorentz Residual The residual layer is crucial for building deep neural networks. Since there is no well-defined vector addition in the Lorentz model, we assume that each residual layer is preceded by a computational block whose last layer is a Lorentz linear layer, and do the residual-like operation within the preceding Lorentz linear layer of the block as a compromise. Given the input x of the computational block and the output o = f (x) before the last Lorentz linear layer of the block, we take x as the bias of the Lorentz linear layer. Concretely, the final output of the block is where the symbols have the same meaning as those in Eq. (3). Lorentz Position Encoding Some neural networks require positional encoding for their embedding layers, especially those models for NLP tasks. Previous works generally incorporate positional information by adding position embeddings to word embeddings. Given a word embedding x and its corresponding learnable position embedding p, we add a Lorentz linear layer to transform the word embedding x, by taking the position embedding p as the bias. The overall process is the same as described in Eq. (7). Note that the transforming matrix in the Lorentz linear layer is shared across positions. This modification gives us one more d × d matrix than the Euclidean Transformer. The increase in the number of parameters is acceptable compared to huge parameters of the whole model.

Experiments
To verify our proposed framework, we conduct experiments on both shallow and deep neural networks. For shallow neural networks, we conduct experiments on knowledge graph completion and network embedding. For deep neural networks, we propose a Lorentz Transformer and perform experiments on machine translation. Furthermore, dependency tree probing is also done on both Lorentz and Euclidean Transformers to compare their capabilities of representing structured information.
In the following sections, we denote the models built with our proposed framework as HYBONET.
We demonstrate that HyboNet not only outperforms Euclidean and Poincaré models on the majority of tasks, but also converges better than its Poincaré counterpart. All models in §4.1 are trained with 1 NVIDIA 32GB V100 GPU, models in §4.2 are trained with 4 NVIDIA 40GB A100 GPU. For pre-processing and hyper-parameters of each experiment, please refer to our appendix.

Experiments on Shallow Networks
In this part, we leverage our Lorentz embedding and linear layers to build shallow neural networks. We show that HyboNet outperforms previous knowledge graph completion models and graph neural networks (GNNs) on several popular benchmarks.

Knowledge Graph Completion Models
A knowledge graph contains a collection of factual triplets, each triplet (h, r, t) illustrates the existence of a relation r between the head entity h and the tail entity t. Since knowledge graphs are generally incomplete, predicting missing triplets becomes a fundamental research problem. Concretely, the task aims to solve the problem (h, r, ?) and (?, r, t). Two popular knowledge graph completion benchmarks, FB15k-237 [38] and WN18RR [11] are used in our experiments. We report two popular evaluation metrics: MRR (Mean reciprocal rank), the average of the inverse of the true entity ranking in the prediction; H@K, the percentage of the correct entities appearing within the top K positions of the predicted ranking.
Setup Similar to Balazevic et al. [2], we design a score function for each triplet as where e h , e t ∈ L n K are the Lorentz embeddings of the head entity h and the tail entity t, f r (·) is a Lorentz linear transformation of the relation r and δ is a margin hyper-parameter. For each triplet, we randomly corrupt its head or tail entity with k entities, calculate the probabilities for triplets as p = σ(s(h, r, t)) with the sigmoid function, and minimize the binary cross entropy loss where p (i) andp (i,j) are the probabilities for correct and corrupted triplets respectively, and N is the sample number.
Results Table 1 shows the results on both datasets. As expected, low dimensional hyperbolic networks achieve comparable or even better results than Euclidean baselines. When the dimensionality of hyperbolic networks is raised to a maximum of 500, HYBONET outperforms all other baselines on MRR, H@3, and H@1 by a large margin. We also compare our HYBONET with other hyperbolic networks. As shown in Figures 2a and 2b, HYBONET converges better than other hyperbolic networks on both datasets and has a higher ceiling, demonstrating the superiority of our Lorentz linear layer over conventional linear layer formalized in tangent space.

Graph Neural Networks
Previous works have shown that when equipped with hyperbolic geometry, GNNs demonstrate impressive improvements compared with its Euclidean counterparts [6,23]. In this part, we extend GCNs with our proposed hyperbolic framework. Following Chami et al. [6], we evaluate our HYBONET for link prediction and node classification on four network embedding datasets, and observe better or comparable results as compared to previous methods.
Setup The architecture of GCNs can be summarized into three parts: feature transformation, neighborhood aggregation and non-linear activation. We use a Lorentz linear layer for the feature transformation, and use the centroid of neighboring node features as the aggregation result. The nonlinear activation is integrated into Lorentz linear layer as elaborated in §3.1. The overall operations of the l-th network layer can be formulated into the following manner: where x l i refers to the representation of the i-th node at the layer l, N (i) denotes the neighboring nodes of the i-th node. Note that we do not apply attention operations when performing aggregation, all the neighboring nodes are simply uniformly aggregated. With the node representation, we can easily Table 2: Test ROC AUC results (%) for Link Prediction (LP) and F1 scores (%) for Node Classification (NC). HGCN and HYBONET are hyperbolic models. δ refers to Gromovs δ-hyperbolicity, and is given by Chami et al. [6]. The lower the δ, the more hyperbolic the graph.

Disease(δ = 0)
Airport(δ = 1) PubMed(δ = 3.5) Cora(δ = 11) conduct link prediction and node classification. For both tasks, we train HYBONET by minimizing a margin ranking loss where δ is the margin hyper-parameter. For link prediction, d is the distance between nodes where link exits, d is the distance for negative samples. For node classification, d is the distance between the node representation and the correct class, and d is the distance between the node and wrong class.
Results Following Chami et al. [6], we report ROC AUC results for link prediction and F1 scores for node classification on four different network embedding datasets. The description of the datasets can be found in our appendix. Chami et al. [6] compute Gromovs δ-hyperbolicity [17,1,28] for these four datasets. The lower the δ is, the more hyperbolic the graph is.
The results are reported in Table 2. HYBONET outperforms other baselines by a remarkable margin in those highly hyperbolic datasetes. For Disease dataset, HYBONET even achieves a 20% (absolute) improvement on node classification and a 5.5% improvement on link prediction over previous hyperbolic GCNs. We plot the expected validation curves with 95% confidence interval shaded for the Disease and Airport datasets in Figures 2c and 2d. For link prediction, HYBONET converges faster and is more stable across different runs. And for node classification, HYBONET has a comparable convergence speed, and is much more stable on the validation set of Airport dataset when compared with HGCN from Figure 2d. On the less hyperbolic datsaets such as PubMed and Cora, HYBONET still performs well on link prediction, and keeps highly competitive for node classification.

Experiments on Deep Networks
In this part, we replace all components in Transformer [40] with our Lorentz ones introduced in §3. We discard layer normalization for the difficulty of defining hyperbolic mean and variance, but it is still kept in our Euclidean Transformer baseline. In fact, λ in Eq.(3) could control the scaling range of our hyperbolic linear layers, which can play a similar role as layer normalization operations.

Machine Translation
For machine translation, we report the results on two widely-used machine translation benchmarks: IWSLT'14 English-German and WMT'17 English-German.
Setup We use OpenNMT [19] to build Euclidean Transformer and our Lorentz one. Following the settings used in previous hyperbolic work [14,34], we conduct experiments in different dimensional settings. The dimension of input embeddings range from {64, 128, 256}, and the dimension of innerlayers is always four times the input dimension. Other hyper-parameters are detailed in appendix.

Results
The BLEU scores on the test set of IWSLT'14 and newstest2013 test set of WMT'17 are shown in Table 3. Both HYBONET and HATT, the two Transformer-based hyperbolic models, outperform the Euclidean Transformer. However, HATT only adapt the attention module into the hyperbolic space, leaving the remaining computational blocks in the Euclidean space. As a result, the advantage of hyperbolic space is not well utilized. As a fully hyperbolic Transformer, HYBONET performs all its operations in the hyperbolic space, making it better utilize the hyperbolic space, and achieve significant improvement over both Euclidean and Euclidean-Hyperbolic-mixed Transformer.

Dependency Tree probing
Previous works have shown that neural networks implicitly embed syntax trees in their intermediate context representations [16,32]. Given the results shown in §4.2.1, we assume that an important reason why our model works better than its Euclidean counterpart is that our model better captures structured information in the sentences. To validate our assumption, we perform the probing task on both Euclidean and Lorentz Transformers obtained in §4.2.1. We use the dependency tree parsing result of stanza [31] on IWSLT'14 English corpus as our dataset. The data partition is kept the same.
Setup For a fair comparison, we probe both Euclidean and Lorentz Transformer in hyperbolic space following Chen et al. [10]. Please refer to the original paper [10] or appendix for more details of the experiment setup.

Results
The probing results on IWSLT'14 are shown in Table 3. UUAS refers to undirected attachment score, which is the percent of undirected edges placed correctly against the gold tree. Root% refers to the precision of the model predicting the root of the syntactic tree. Dspr. and Nspr. are spearman correlations between true and predicted distances for each word in each sentence, true depth ordering and the predicted ordering, respectively.
HYBONET outperforms other baselines by a large margin. Obviously, syntax trees can be better reconstructed from the intermediate representation of HYBONET's encoder, which shows that HYBONET indeed better at learning syntax structure. Also, the probing on HATT(Euclidean-Hyperbolic-mixed Transformer) is better than Euclidean Transformer, but worse than on HYBONET, indicating that as the model becomes more hyperbolic, the ability to learn structured information becomes stronger.

Related Work
Hyperbolic geometry has been widely investigated in representation learning in recent years, due to its great expression capacity in modeling complex data with non-Euclidean properties. Nickel & Kiela [29] first propose to use hyperbolic space to encode the transitive closure of the WordNet noun hierarchy. They indicate that hyperbolic space is superior to Euclidean space in terms of both representation capacity and generalization ability, especially in low dimensions. Moreover, Ganea et al. [12] and Nickel & Kiela [30] introduce the basic operations of neural networks in the Poincaré ball and the Lorentz model respectively. After that, researchers further introduce various types of neural models in hyperbolic space including hyperbolic attention networks [14], hyperbolic graph neural networks [23,6], hyperbolic prototypical networks [26] and hyperbolic capsule networks [9]. Recently, with the rapid development of hyperbolic neural networks, people attempt to utilize them in various downstream tasks such as word embeddings [37], knowledge graph embeddings [7], entity typing [24], text classification [44], question answering [36] and machine translation [14,34], to handle their non-Euclidean properties, and have achieved significant and consistent improvement compared to the traditional neural models in Euclidean space.

Conclusion and Future Work
In this work, we propose a novel fully hyperbolic framework based on the Lorentz transformations to overcome the problem that hybrid architectures of existing hyperbolic neural networks relied on the tangent space limit network capabilities. The experimental results on four representative NLP tasks show that hyperbolic neural networks built on our framework have faster speed, better convergence, and higher performance, even achieve better performance with fewer parameters. This is of great importance for reducing the computational resources required for training models and can contribute to the reduction of carbon emissions. Our proposed method does not bring in extra negative societal impacts. In addition, we also observe that some challenging problems require further efforts: (1) Though verifying the effectiveness of fully hyperbolic models in NLP, explore its applications in computer vision is still a valuable direction. (2) It is also worthwhile to continue improving our framework to make the model exceed its Euclidean counterpart even in the deeper and higher-dimensional case, such as exploring large-scale hyperbolic pre-trained language models.

A Data Description and Preprocessing Methods
We will briefly introduce the dataset we used and describe data preprocessing methods for each experiment in this section.

A.1 Knowledge Graph Completion
The statistics of WN18RR and FB15k-237 are listed in Table 4. WN18RR is a subset of WordNet. It contains 11 lexical relations between 40943 word senses. FB15k-237 is a subset of Freebase containing 237 relations between 14541 entities. We keep our data preprocessing method for knowledge graph completion the same as Balazevic et al. [2]. Concretely, we augment both WN18RR and FB15k-237 by adding reciprocal relations for every triplet, i.e. for every (h, r, t) in the dataset, we add an additional triplet (t, r −1 , h).

A.2 Network Embedding
We use four datasets, refered to as Disease, Airport, Pubmed and Cora. The four datasets are proprocessed by Chami et al. [6] and published in their code repository 5 . We refer the readers to Chami et al. [6] for further information about the datasets.

A.3 Machine Translation
For IWSLT'14, we use the preprocessing script provided by FairSeq 6 . For WMT'17, we use the preprocessing script provided by HyperNN++ [34] 7

B Experiment Details
All of our experiments use 32-bit floating point numbers, not 64-bit floating point numbers as in most previous work. We use PyTorch as the neural networks' framework. The negative curvature K of the Lorentz model in our experiments is −1.

B.1 Lorentz Linear in Practice
We take the function φ in Lorentz linear layer to have the form To see what it means, we first compute y 0 = λσ(v T x + b) + as the 0-th dimension of the output y, where σ is the sigmoid function, λ controls the 0-th dimension's range, it can be either learnable or fixed, b is a learnable bias term, and > 1/K is a constant preventing the 0-th dimension be smaller than 1/K. According to the definition of Lorentz model, y should satisfies y 1:n 2 − y 0 2 = 1/K, that is, y 1:n = y 0 2 + 1/K = (λσ(v T x + b) + ) 2 + 1/K. Then equation (8) can be seen as first calculateỹ 1:n = Wh dropout(x) , then scaleỹ 1:n to have vector norm y 1:n to obtain y 1:n . Finally, we concatenate y 0 with y 1:n as output.
For residual and position embedding addition, we also use Eq.(8).

B.2 Initialization
We use different initialization method for different parameters, see Table 5. Geoopt [20] initialize the parameter with Gaussian distribution in the tangent space, and map the embedding to hyperbolic space with exponential map. For hyperbolic embedding of knowledge graph completion, we use a Gaussian distribution with standard deviation equals to 1/ √ dim in tangent space, for other tasks, we use a standard normal distribution.

B.4 Network Embedding
The experiment setting is the same as Chami et al. [6]. We list the hyper-parameters for the four datasets in Table 7

B.5 Machine Translation
Our code is based on OpenNMT's Transformer [19]. The hyper-parameters are listed in Table 8 B.6 Dependency Tree Probing The probing for the Euclidean Transformer is done by first applying an Euclidean linear mapping f P : R n → R m+1 followed by a projection to map Transformer's intermediate context-aware representation c i into pointsh i in tangent space of Lorentz model's origin, then using exponential map to maph i to hyperbolic space p i . In the hyperbolic space, we construct the Lorentz syntactic subspace via a Lorentz linear layer f Q : L m K → L m K : p i = exp K 0 (f P (c i )), q i = f Q (p i ).
We use the squared Lorentzian distance between q i and q j to recreate tree distances between word pairs w i and w j , the squared Lorentzian distance between q i and the origin o to recreate the depth of word w i . We minimize the following loss: where d T (w i , w j ) is the edge number of the shortest path from w i to w j in the dependency tree, and l is the sentence length. For the probing of Lorentz Transformer, we only substitute f P with a Lorentz one, and discard the exponential map. We probe every layer for both models, and report the results of the best layer.