Identifying Linguistic Areas for Geolocation

Tommaso Fornaciari; Dirk Hovy

doi:10.18653/v1/D19-5530

Identifying Linguistic Areas for Geolocation

Abstract

Geolocating social media posts relies on the assumption that language carries sufficient geographic information. However, locations are usually given as continuous latitude/longitude tuples, so we first need to define discrete geographic regions that can serve as labels. Most studies use some form of clustering to discretize the continuous coordinates (Han et al., 2016). However, the resulting regions do not always correspond to existing linguistic areas. Consequently, accuracy at 100 miles tends to be good, but degrades for finer-grained distinctions, when different linguistic regions get lumped together. We describe a new algorithm, Point-to-City (P2C), an iterative k-d tree-based method for clustering geographic coordinates and associating them with towns. We create three sets of labels at different levels of granularity, and compare performance of a state-of-the-art geolocation model trained and tested with P2C labels to one with regular k-d tree labels. Even though P2C results in substantially more labels than the baseline, model accuracy increases significantly over using traditional labels at the fine-grained level, while staying comparable at 100 miles. The results suggest that identifying meaningful linguistic areas is crucial for improving geolocation at a fine-grained level.

Anthology ID:: D19-5530
Volume:: Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:: WNUT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 231–236
Language:
URL:: https://aclanthology.org/D19-5530/
DOI:: 10.18653/v1/D19-5530
Bibkey:
Cite (ACL):: Tommaso Fornaciari and Dirk Hovy. 2019. Identifying Linguistic Areas for Geolocation. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 231–236, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: Identifying Linguistic Areas for Geolocation (Fornaciari & Hovy, WNUT 2019)
Copy Citation:
PDF:: https://aclanthology.org/D19-5530.pdf

PDF Cite Search Fix data