GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding

Zekun Li, Wenxuan Zhou, Yao-Yi Chiang, Muhao Chen


Abstract
Humans subconsciously engage in geospatial reasoning when reading articles. We recognize place names and their spatial relations in text and mentally associate them with their physical locations on Earth. Although pretrained language models can mimic this cognitive process using linguistic context, they do not utilize valuable geospatial information in large, widely available geographical databases, e.g., OpenStreetMap. This paper introduces GeoLM, a geospatially grounded language model that enhances the understanding of geo-entities in natural language. GeoLM leverages geo-entity mentions as anchors to connect linguistic information in text corpora with geospatial information extracted from geographical databases. GeoLM connects the two types of context through contrastive learning and masked language modeling. It also incorporates a spatial coordinate embedding mechanism to encode distance and direction relations to capture geospatial context. In the experiment, we demonstrate that GeoLM exhibits promising capabilities in supporting toponym recognition, toponym linking, relation extraction, and geo-entity typing, which bridge the gap between natural language processing and geospatial sciences. The code is publicly available at https://github.com/knowledge-computing/geolm.
Anthology ID:
2023.emnlp-main.317
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5227–5240
Language:
URL:
https://aclanthology.org/2023.emnlp-main.317
DOI:
10.18653/v1/2023.emnlp-main.317
Bibkey:
Cite (ACL):
Zekun Li, Wenxuan Zhou, Yao-Yi Chiang, and Muhao Chen. 2023. GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5227–5240, Singapore. Association for Computational Linguistics.
Cite (Informal):
GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding (Li et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.317.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.317.mp4