Social Media Variety Geolocation with geoBERT

Yves Scherrer, Nikola Ljubešić


Abstract
This paper describes the Helsinki–Ljubljana contribution to the VarDial 2021 shared task on social media variety geolocation. Following our successful participation at VarDial 2020, we again propose constrained and unconstrained systems based on the BERT architecture. In this paper, we report experiments with different tokenization settings and different pre-trained models, and we contrast our parameter-free regression approach with various classification schemes proposed by other participants at VarDial 2020. Both the code and the best-performing pre-trained models are made freely available.
Anthology ID:
2021.vardial-1.16
Volume:
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
April
Year:
2021
Address:
Kiyv, Ukraine
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer, Tommi Jauhiainen
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
135–140
Language:
URL:
https://aclanthology.org/2021.vardial-1.16
DOI:
Bibkey:
Cite (ACL):
Yves Scherrer and Nikola Ljubešić. 2021. Social Media Variety Geolocation with geoBERT. In Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 135–140, Kiyv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
Social Media Variety Geolocation with geoBERT (Scherrer & Ljubešić, VarDial 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.vardial-1.16.pdf