Modeling Global Syntactic Variation in English Using Dialect Classification

Jonathan Dunn


Abstract
This paper evaluates global-scale dialect identification for 14 national varieties of English on both web-crawled data and Twitter data. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers.
Anthology ID:
W19-1405
Volume:
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
June
Year:
2019
Address:
Ann Arbor, Michigan
Venues:
NAACL | VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42–53
Language:
URL:
https://aclanthology.org/W19-1405
DOI:
10.18653/v1/W19-1405
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/W19-1405.pdf
Software:
 W19-1405.Software.zip