SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation

Named geographic entities (geo-entities for short) are the building blocks of many geographic datasets. Characterizing geo-entities is integral to various application domains, such as geo-intelligence and map comprehension, while a key challenge is to capture the spatial-varying context of an entity. We hypothesize that we shall know the characteristics of a geo-entity by its surrounding entities, similar to knowing word meanings by their linguistic context. Accordingly, we propose a novel spatial language model, SpaBERT, which provides a general-purpose geo-entity representation based on neighboring entities in geospatial data. SpaBERT extends BERT to capture linearized spatial context, while incorporating a spatial coordinate embedding mechanism to preserve spatial relations of entities in the 2-dimensional space. SpaBERT is pretrained with masked language modeling and masked entity prediction tasks to learn spatial dependencies. We apply SpaBERT to two downstream tasks: geo-entity typing and geo-entity linking. Compared with the existing language models that do not use spatial context, SpaBERT shows significant performance improvement on both tasks. We also analyze the entity representation from SpaBERT in various settings and the effect of spatial coordinate embedding.


Introduction
Interpreting human behaviors requires considering human activities and their surrounding environment.Looking at a stopping location, [ Speedway , ], 1 from a person's trajectory, we might assume that this person needs to use the location's amenities if Speedway implies a gas station, and is near a highway exit.We might predict a meetup at [ Speedway , ] if the trajectory travels through many other locations, , , ..., of the same 1 A geographic entity name Speedway and its location (e.g., latitude and longitude).Best viewed in color.name, Speedway , to arrive at [ Speedway , ] in the middle of farmlands.As humans, we are able to make such inferences using the name of a geographic entity (geo-entity) and other entities in a spatial neighborhood.Specifically, we contextualize a geo-entity by a reasonable surrounding neighborhood learned from experience and, from the neighborhood, relate other relevant geo-entities based on their name and spatial relations (e.g., distance) to the geo-entity.This way, even if two gas stations have the same name (e.g., Speedway ) and entity type (e.g., 'gas station'), we can still reason about their spatially varying semantics and use the semantics for prediction.
Capturing this spatially varying location semantics can help recognizing and resolving geospatial concepts (e.g., toponym detection, typing and linking) and the grounding of geo-entities in documents, scanned historical maps, and a variety of knowledge bases, such as Wikidata, Open-StreetMap, and GeoNames.Also, the location semantics can support effective use of spatial textual information (geo-entities names) in many spatial computing task, including moving behavior detection from visiting locations of trajectories (Yue et al., 2021(Yue et al., , 2019)), point of interest recommendations (Yin et al., 2017;Zhao et al., 2022), air quality (Lin et al., 2017(Lin et al., , 2018(Lin et al., , 2020;;Jiang et al., 2019) and traffic prediction (Yuan and Li, 2021;Gao et al., 2019) using location context.
Recently, the research community has seen a rapid advancement in pretrained language models (PLMs) (Devlin et al., 2019;Liu et al., 2019;Lewis et al., 2020;Sanh et al., 2019), which supports strong contextualized language representation abilities (Lan et al., 2020) and serves as the backbones of various NLP systems (Rothe et al., 2020;Yang et al., 2019).The extensions of these PLMs help NL tasks in different data domains (e.g., biomedicine (Lee et al., 2020;Phan et al., 2021), software engineering (Tabassum et al., 2020), fi-Figure 1: Overview for generating the pivot entity representation, E p ( : pivot entity; : neighboring geo-entities).SPABERT sorts and concatenate neighboring geo-entities by their distance to pivot in ascending order to form a pseudo-sentence.[CLS] is prepended at the beginning.[SEP] separates entities.SPABERT generates token representations and aggregates representations of pivot tokens to produce E p .nance (Liu et al., 2021)) and modalities (e.g., tables (Herzig et al., 2020;Yin et al., 2020;Wang et al., 2022) and images (Li et al., 2020a;Su et al., 2019)).However, it is challenging to adopt existing PLMs or their extensions to capture geo-entities' spatially varying semantics.First, geo-entities exist in the physical world.Their spatial relations (i.e., distance and orientation) do not have a fixed structure (e.g., within a table or along a row of a table) that can help contextualization.Second, existing language models (LMs) pretrained on general domain corpora (Devlin et al., 2019;Liu et al., 2019) require fine-tuning for domain adaptation to handle names of geo-entities.
To tackle these challenges, we present SPABERT ( ), a LM that captures the spatially varying semantics of geo-entity names using large geographic datasets for entity representation.Built upon BERT (Devlin et al., 2019), SPABERT generates the contextualized representation for a geo-entity of interest (referred to as the pivot), based on its geographically nearby geoentities.Specifically, SPABERT linearizes the 2-dimensional spatial context by forming pseudo sentences that consist of names of the pivot and neighboring geo-entities, ordered by their spatial distance to the pivot.SPABERT also encodes the spatial relations between the pivot and neighboring geo-entities with a continuous spatial coordinate embedding.The spatial coordinate embedding models horizontal and vertical distance relations separately and, in turn, can capture the orientation relations.These techniques make the inputs compatible with the BERT-family structures.
In addition, the backbone LM, BERT, in SPABERT is pretrained with general-purpose corpora and would not work well directly on geoentity names because of the domain shift.We thus train SPABERT using pseudo sentences generated from large geographic datasets derived from OpenStreetMap (OSM).2This pretraining process conducts Masked Language Modeling (MLM) and Masked Entity Prediction (MEP) that randomly masks subtokens and full geo-entity names in the pseudo sentences, respectively.MLM and MEP enable SPABERT to learn from pseudo sentences for generating spatially varying contextualized representations for geo-entities.
SPABERT provides a general-purpose representation for geo-entities based on their spatial context.Similar to linguistic context, spatial context refers to the surrounding environment of a geo-entity.For example, [ Speedway , ], [ Speedway , ], and [ Speedway , ] would have different representations since the surrounding environment of , , and could vary.We evaluate SPABERT on two tasks: 1) geo-entity typing and 2) geo-entity linking to external knowledge bases.Our analysis includes the performance comparison of SPABERT in various settings due to characteristics of geographic data sources (e.g., entity omission).
To summarize, this work has the following contributions.We propose an approach to linearize the 2-dimensional spatial context, encode the geoentity spatial relations, and use a LM to produce spatial varying feature representations of geoentities.We show that SPABERT is a generalpurpose encoder by supporting geo-entity typing and geo-entity linking, which are keys to effectively integrating large varieties of geographic data sources, the grounding of geo-entities as well as supports for a broad range of spatial computing applications.The experiments demonstrate that SPABERT is effective for both tasks and outperforms the SOTA LMs.

SPABERT
SPABERT is a LM built upon a pretrained BERT and further trained to produce contextualized geo-entity representations given large geographic datasets.Fig. 1 shows the outline of SPABERT, with the details below.This section first presents the preliminary ( §2.1) and then describes the overall approach for learning geo-entities' contextualized representations ( §2.2), the pretraining strategies ( §2.3), and inference procedures ( §2.4).

Preliminary
We assume that given a geographic dataset (e.g., OSM) with many geo-entities, S = {g 1 , g 2 , ..., g l }, each geo-entity g i (e.g., [ Speedway , ]) has two attributes: name g name ( Speedway ) and location g loc ( ).The location attribute, g loc , is a tuple g loc = (g locx , g locy , ...) that identifies the location in a coordinate system (e.g., x and y image pixel coordinates or latitude and longitude geocoordinates with altitudes).WLOG, here we assume a 2-dimensional space.SPABERT aims to generate a contextualized representation for each geo-entity g i in S given its spatial context.We denote a geo-entity that we seek to contextualize as the pivot entity, g p , p for short.The spatial context of p is SC(p) = {g n 1 , ..., g n k } where distance(p, g n k ) < T .T is a spatial distance parameter defining a local area.We call g n 1 , ..., g n k as p's neighboring geo-entities and denote them as n 1 , ..., n k when there is no confusion.

Contextualizing Geo-entities
Linearizing Neighboring Geo-entity Names For a pivot, p, SPABERT first linearizes its neighboring geo-entitie names to form a BERT-compatible input sequence, called a pseudo sentence.The corresponding pseudo sentence for the example in Fig. 1 is constructed as: The pseudo sentence starts with the pivot name followed by the names of the pivot's neighboring geo-entities, ordered by their spatial distance to the pivot in ascending order.The idea is that nearby geo-entities are more related (for contextualization) than distant geo-entities.SPABERT also tokenizes the pseudo sentences using the original BERT tokenizer with the special token [SEP] to separate entity names.The subtokens of a neighbor n k is denoted as T n k j as in Fig. 2. Encoding Spatial Relations SPABERT adopts two types of position embeddings in addtion to the token embeddings in the pseudo sentences (Fig. 2).The sequence position embedding represents the token order, same as the original position embedding in BERT and other Transformer LMs.Also, SPABERT incorporates a spatial coordinate embedding mechanism, which seeks to represent the spatial relations between the pivot and its neighboring geo-entities.SPABERT's spatial coordinate embeddings encode each location dimension separately to capture the relative distance and orientation between geo-entities.Specifically, for the 2-dimensional space, SPABERT generates normalized distance dist n k x and dist n k y for each neighboring geo-entity using the following equations: where Z is a normalization factor, and (g locx p , g locy p ), (g locx n k , g locy n k ) are the locations of pivot p and neighboring entity n k .Note that the tokens belong to the same geo-entity name have the same dist n k x and dist n k y .Also, SPABERT uses DSEP, a constant numerical value larger than max(dist n k x , dist n k y ) for all neighboring entities to differentiate special tokens from entity name tokens.
SPABERT encodes dist n k x and dist n k y using a continuous spatial coordinate embedding layer with real-valued distances as input to preserve the continuity of the output embeddings.Let S n k be the spatial coordinate embedding of the neighbor entity n k , M be the embedding's dimension, and S n k ∈ R M .We define S n k as: x and dist n k y for the spatial coordinate embeddings along the horizontal and vertical directions, respectively.
The token embedding, sequence position embedding and spatial coordinate embedding are summed up then fed into the encoder.Similar to BERT, SPABERT encoder calculates an output embedding for each token.Then SPABERT averages the pivot's token-level embeddings to produce a fixedlength embedding for the pivot's contextualized representation.

Pretraining
We train SPABERT with two tasks to adapt the pretrained BERT backbone to geo-entity pseudo sentences.One task is the masked language modeling (MLM) (Devlin et al., 2019), for which SPABERT needs to learn how to complete the full names of geo-entities from pseudo sentences with randomly masked subtokens using the remaining subtokens and their spatial coordinates (i.e., partial names and spatial relations between subtokens).The block below shows an example of the masked input for MLM, where ### are the masked subtokens.In addition, we hypothesize that given common spatial co-occurrence patterns in the real world, one can use neighboring geo-entities to predict the name of a geo-entity.Therefore, we propose and incorporate a masked entity prediction (MEP) task, which randomly masks all subtokens of an entity name in a pseudo sentence.For MEP, SPABERT relies on the masked entity's spatial relation to neighboring entities to recover the masked name.The block below shows an example of the masked input for MEP.
[CLS] University of Minnesota [SEP] Both MLM and MEP have a masking rate of 15% (without masking [CLS] and [SEP]).The sequence position and spatial coordinates are not masked.
Pretraining Data We construct a large training set from OSM covering the City of London and the State of California.Since OSM is crowd-sourced, we clean the raw data by removing non-alphanumeric place names and geo-entities that do not have a place name or geocoordinates.We randomly select geo-entities as pivots and construct pseudo sentences as described in §2.2.
To efficiently find the neighboring entities from a pivot sorted by distances, we leverage the Geohash algorithm introduced by Gustavo Niemeyer in 2008.We first generate the Geohash string for geo-entities on OpenStreetMap, which encodes a location into a string of a maximum length of 20 with base 32.For example, a city 'Madelia' with the geolocation (44.0508°N, 94.4183°W) has a Geohash string of '9zufe7nwjefpjksty0m8'.The length of the hash string is associated with the size of the geographic area that the string represents.The first character divides the entire Earth's surface into 32 large grids, and the second character further divides one large grid into 32 grids.Thus, by comparing the leading characters of two hash strings, we can estimate the spatial proximity of two locations without calculating the exact distance between them.We use the hash strings to first filter out the geo-entities guaranteed to be outside the spatial context radius and keep the neighbors within the nearest nine hash grids for distance calculation.The use of Geohash reduces the computation time when constructing the pseudo sentences.Fig. 3 shows an example of Geohash representations.Given the pivot 'Kenneth H Keller Hall', we can calculate its hash string of different lengths to quickly retrieve the neighboring entities within a desired range (i.e., the green box).
This way, we generate 69,168 and 148,811 pseudo sentences in London and California, respectively, leading to a pretraining corpus of around 150M words.We use all of them for pretraining SPABERT with MLM and MEP.

Inference
The pretrained SPABERT can already generate spatially varying contextualized representations for geo-entities.The representations support various downstream applications, including contextualized geo-entity classification and similarity-based inference tasks, such as geo-entity typing and linking.
Contextualized Geo-entity Classification Classification of a geo-entity is crucial for recognizing and resolving geospatial concepts and the grounding of geo-entities in various data sources.Here, we use geo-entity typing as an example task to demonstrate the effectiveness of contextualized geo-entity representations.In this task, we aim to predict the geo-entity's semantic type (e.g., transportation and healthcare).We stack a softmax prediction head on top of the final-layer hidden states (i.e., geo-entity embeddings) to perform the classification.
Similarity-based Inference Integrating multisource geographic datasets often involves finding the top K nearest geo-entities in some representation space.One example task, which we refer to as geo-entity linking, is to link geo-entities from a geographic information system (GIS) oriented dataset to graph-based knowledge bases (KBs).In such a task, we can directly use the contextualized geoentity embeddings from SPABERT and calculate the embedding similarity based on some metrics, such as the cosine distance, to match the corresponding geo-entities from separate data sources.

Experiments
We evaluate SPABERT on supervised geo-entity typing and unsupervised geo-entity linking tasks ( §3.1- §3.3).We also investigate how SPABERT performs under various common characteristics of geographic data sources (e.g., entity omission from cartographic generalization) and how critical technical components and their parameters affect the performance ( §3.4).

Experimental Setup
Datasets Open Street Map (OSM) 2 is a large-scale geographic database, and it is widely used in many popular map-related services.For supervised geoentity typing, we randomly select 25,872 and 7,726 pivot geo-entities together with their neighboring entities from point features of nine semantic types on OSM in the State of California and the City of London, respectively (Tab.1).Each geo-entity is associated with a name and its geocoordinates.We use 80% of the data in both regions for training and 20% for testing.The task is to predict the OSM semantic type for each geo-entity.( §3.2).
For unsupervised geo-entity linking, we randomly select 14 scanned historical maps in California from the United States Geological Survey (USGS) Historical Topographic Maps Collection.We manually transcribe text labels in these maps to generate 154 geo-entities, each containing a name (from the text label) and the image coordinates (text label's pixel location in the scanned map) The task is to find the corresponding Wikidata geo-entity3 for each USGS geo-entity with only their spatial relations from pixel locations.Note that the geocoordinates of geo-entities in the scanned maps are unknown.We select 15K Wikidata geo-entities having an identical name to one of the USGS geoentities4 and then randomly add another 30K Wikidata geo-entities to construct the final Wikidata dataset of 45K geo-entities.For the ground truth, we manually identify the matched Wikidata geoentity for each USGS geo-entity.
For geo-entity linking task, the annotation details are described below.We hire three undergraduate students (not the co-authors) to transcribe text labels on the USGS maps.They draw bounding boxes/polygons around text labels and transcribe

Model Configuration
We create two variants of SPABERT, namely SPABERT base and SPABERT large with weights and tokenizers initialized from the uncased versions of the BERT base and BERT large , respectively.During pretraining, we mix the masked instances for MLM and MEP tasks in the same ratio.We primarily use the same optimization parameters as in (Devlin et al., 2019) and train the model with the AdamW optimizer.We save the model checkpoints every 2K iterations.The learning rate is 5×10 −5 for SPABERT Base and 1 × 10 −6 for SPABERT Large for stability.Distance normalization factor Z is 0.0001, and distance separator DSEP is 20.The batch size is 12 to fit in one NVIDIA RTX A5000 GPU with 24GB memory.

Supervised Geo-Entity Typing
Task Description We tackle geo-entity typing as a supervised classification task using the modified SPABERT architecture in §2.4 and finetune the model end-to-end by optimizing the cross-entropy loss.Specifically, the training set of this task contains {(g i , SC(g i ), y i )} N i=1 , where g i is the pivot entity; y i is its OSM semantic type label; SC(g i ) contains its neighboring entities in the nearest nine hash grids.Note that during training, the model does not use the neighboring geo-entities' OSM semantic types.We report the F1 score for each class (i.e., an OSM semantic type) and the micro-F1 for all samples in the test set.

Model Comparison
We compare SPABERT with several strong PLMs, including BERT (Devlin et al., 2019), RoBERTa (Liu et al., 2019), Span-BERT (Joshi et al., 2020), LUKE (Yamada et al., 2020), and SimCSE (Gao et al., 2021).For all these models, we use the pretrained weights with the uncased version.For SimCSE, we use both the BERT and RoBERTa versions with the weights obtained from unsupervised training.We further finetune these baseline LMs end-to-end on the same pseudo-sentence dataset as SPABERT (Tab. 1) with a softmax prediction head appended after the baseline model.Note that these baseline models only use the token and position embeddings but not the spatial coordinate embeddings.
Result Analysis Tab. 2 summarizes the geo-entity typing results on the OSM test set with finetuned baselines and SPABERT.The last column shows the micro-F1 scores (per-instance average).The remaining columns are the F1 scores for individual classes.For all baseline models, BERT Base and SimCSE BERT-Large (BERT Large as a close second) have the highest F1 in the base and large groups, respectively.In addition, SPABERT shows the best performance over the existing context-aware LMs for both base and large groups.We hypothesize that SPABERT's advantage comes from 1) the pretraining tasks, especially MEP, and 2) the spatial coordinate embeddings.Additionally, compared to its backbone, BERT, SPABERT shows performance improvement in almost all classes for both base and large models.The results demonstrate that SPABERT can effectively contextualize geoentities with their spatial context.We can also observe that the Financial class has high F1 scores for all models.The reason is that when entity names are strongly associated with the semantic type (e.g., U.S. Bank), pretrained LMs could produce meaningful entity representation even without the spatial context.However, the typing performance can still be further improved after encoding the spatial context with SPABERT.Fig. 4 visualizes the geo-entity features from SPABERT Base .It shows that Financial and Public_Service have the most separable features from other classes.The non-separable region in the middle is mainly due to the similar spatial context and semantics of geo-entities (e.g."nursing home" can be close to both Facilities and Healthcare).

Unsupervised Geo-Entity Linking
Task Description We define the problem as follows.For a query set Q = {(q i , SC(q i ))} |Q| i=1 , the goal is to find the corresponding geo-entity for each Here, Q and C are the USGS and Wikidata geo-entities ( §3.1).SC(q i ) contains q i 's nearest 20 geo-entities, and SC(c i ) includes c i 's neighboring geo-entities within a 5km radius. 5 For this task, we use the evaluation metrics of Recall@K and Mean-Reciprocal-Rank (MRR).

Model Comparison
We use the same baseline 5 Because Q is not georeferenced, we use the top K nearest neighbors instead of pixel distances.

Model
MRR R@1 R@5 R@10 BERT Base . models as in the typing task without finetuning.
Result Analysis Tab. 3 shows the geo-entity linking results.Large models perform better than their base counterparts for both baseline models and SPABERT.In particular, among the baseline models, SimCSE BERT has the highest score for MRR and R@1.SPABERT has significantly higher R@5 and R@10 than SimCSE.Also, SPABERT outperforms all baselines on MRR, R@5, and R@10.For both base and large versions, SPABERT outperforms its backbone model BERT.The difference between BERT and SPABERT shows the effectiveness of the spatial coordinate embeddings and domain adaptation using MLM and the proposed MEP.Also, SpanBERT shows the worst performance among all models.This could be that Span-BERT utilizes masked random spans in pretraining, which does not work well for pseudo sentences of geo-entity names.In contrast, SPABERT's MEP specifically masks tokens from the same

Spatial Context and Ablation Study
Impact of Entity Omission for Linking We analyze how SPABERT handles different USGS map scales for geo-entity linking with entity omission from cartographic generalization (e.g., some entities only exist in certain scales).Here SPABERT performs the best for the 1:125K maps (30-CA) (Tab.4).Our hypothesis is that the 1:125K maps contain denser geo-entities than the test maps at other scales.When selecting the top K nearest neighbors for a pivot in other map scales, some neighbors could be far away and would not provide relevant context.Since the entity omission criteria of the USGS maps are unknown and Wikidata mostly contain important landmark geo-entities, we also simulate omission using the OSM dataset by gradually removing random neighbors of a pivot entity within a fixed neighborhood.We test this scenario for geoentity linking to see if the same pivot entity would have similar representations from decreasing neighbor densities (i.e., the original set of neighbors vs. after random omission at varying rates).We use the same evaluation metric as in §3.3, as they reflect the similarity of the learned geo-entity representation (Fig. 5).The linking scores decrease with additional entities removed, indicating the similarity between the same pivot entity's representations learned from the original data and the omitted data declines when omission percentage becomes larger.The elbow point is at about 30%, showing that SPABERT is reasonably robust if the percentage of the removed neighbors is within 30%.

Impact of Pseudo Sentence Length
We analyze the impact of pseudo sentence length on geo-entity typing by varying the number of neighboring entities.Tab. 5 shows the results of SPABERT Base and SPABERT Large with an increasing number of neighbors.We observe that as #Neighbor increases,  the micro-F1 score increases accordingly, as expected.The increment is more evident at the beginning, illustrating that the spatial context helps contextualize the pivot entity, especially in a local neighborhood.The benefit of adding additional neighboring entities decreases as distance increases beyond a certain point.This conforms to the wellstudied spatial autocorrelation.

Effect of Spatial Coordinate Embedding
We train two additional SPABERT variants that do not include the spatial coordinate embedding during MLM and MEP pretraining and use them for geo-entity linking.For MRR, SPABERT Base and SPABERT Large drop from 0.515 to 0.458 and from 0.537 to 0.478 compared to their original SPABERT version.Also, the variants have lower recall scores compared to their original SPABERT version.The most significant drop in the recall is on R@1 from 0.383 to 0.283 for SPABERT Large .The results demonstrate the effectiveness of the spatial coordinate embedding.
To extend the use of PLMs beyond language, much exploration has also been conducted to represent other modalities.For example, a significant amount of vision-language models (Kim et al., 2021;Zhai et al., 2022;Li et al., 2020a;Lu et al., 2019) have been developed by jointly pretraining on co-occurring vision and language corpora, and are in the support of grounding (Zhang et al., 2020;Chen et al., 2020), generation (Cho et al., 2021;Yu et al., 2022) and retrieval tasks (Zhang et al., 2021;Li et al., 2020b) on the vision modality.Other PLMs capture semi-structure tabular data by linearizing table cells (Herzig et al., 2020;Yin et al., 2020;Iida et al., 2021) or incorporating structural prior in the attention mechanism (Wang et al., 2022;Trabelsi et al., 2022;Eisenschlos et al., 2021).To the best of our knowledge, none of the prior studies have extended pretrained LMs to geo-entity representation, nor do they support a suitable mechanism to capture the geo-entities' spatial relations, which do not have structural prior.This is exactly the focus of our work.Domain-specific Language Modeling Another line of studies has been conducted to adapt PLMs to specific domains typically by pretraining on domain-specific corpora.For example, in the biomedicine domain, a series of models (Lee et al., 2020;Peng et al., 2019;Alsentzer et al., 2019;Phan et al., 2021) have been developed by training PLMs on corpora derived from PubMed.Similarly, PLMs have been trained on corpora specific to software engineering (Tabassum et al., 2020), finance (Liu et al., 2021), and proteomics (Zhou et al., 2020) domains.In this context, SPABERT represents a pilot study in the geographical domain by allowing the PLM to learn from spatially distributed text.

Conclusion
This paper presented SPABERT ( ), a language model trained on geographic datasets for contextualizing geo-entities.SPABERT utilizes a novel spatial coordinate embedding mechanism to capture spatial relations between 2D geo-entities and linearizes the geo-entities into 1D sequences, compatible with the BERT-family structures.The experiments show that the general-purpose representations learned from SPABERT achieve better or competitive results on the geo-entity typing and geo-entity linking tasks compared to SOTA pretrained LMs.We plan to evaluate SPABERT on additional related tasks, such as geo-entity to natural language grounding.Also, we plan to extend SPABERT to support other geo-entity geometry types, including lines and polygons.

Limitations
The current model design only considers points but not polygon and line geometries, which could also help provide meaningful spatial relations for contextualizing a geo-entity.Training of SPABERT also requires considerable GPU resources which might produce environmental impacts.

Ethical Consideration
SPABERT was evaluated on the English-spoken regions, so the model and results could have a bias towards these regions and their commonly used languages.Replacing the backbone of SPABERT with a multi-lingual model and training SPABERT with diverse regions could mitigate the bias.

Figure 2 :
Figure 2: Embedding modules.Inputs to the token-embedding are the tokenized geo-entity names in the pseudosentence.Inputs to the sequence position embedding are the sequence indices of the tokens.Inputs to the spatial coordinate embedding are the normalized distances from the pivot in each spatial dimension (details in §2.2).

Figure 3 :
Figure 3: Example of the Geohash representations with a pivot entity named 'Kenneth H Keller Hall' and its neighboring entities from OpenStreetMap. ( : pivot entity; : neighboring entities; : entities not within the nearest nine grids).The upper hash grids have a string length of six, and the lower ones have a length of seven.
the text string to indicate geo-entity names and their locations.The results are verified and corrected by another undergraduate student and one Ph.D. student.Then, we run an automatic script to find the potential corresponding geo-entity in Wikidata by matching their names.The collected Wikidata URIs are noisy, and we have one Ph.D. student verify the URIs by marking them as True or False linking.The verification results are randomly further reviewed by a senior digital humanities scholar specialized in the representation and reception of historical places.

Figure 4 :
Figure 4: TSNE visualization for entity features produced by SPABERT Base .The color indicates the ground-truth labels.Financial and Public_Service (pointed by the arrows) are well separated from other classes which conforms to the high F1 scores in Tab. 2 (best viewed in color).

Figure 5 :
Figure 5: Geo-entity feature similarity with a decreasing neighbor density.X-axis indicates the percentage of the neighbors removed for the pivot entities.

Table 1 :
OSM Geo-entity typing dataset statistics.The first column shows the nine OSM semantic types, following by the numbers of samples for each class in California and London.

Table 2 :
Edu.Ent.Fac.Fin.Hea.Pub.Sus.Tra.Was.MicroAvg BERT Geo-entity typing with the state-of-the-art LMs and SPABERT.Column names are the OSM classes.Bold numbers are the highest scores in each column.Underlined numbers are the highest scores among baselines.

Table 3 :
Geo-entity linking result.Bold and underlined numbers are the highest scores in each column and the highest scores among the baselines, respectively.

Table 4 :
Impact of entity omission for linking (USGS and Wikidata).Results from SPABERT Large .1:625000 means 1 centimeter on a map represents 625 meters in the physical world.