Modeling Global Syntactic Variation in English Using Dialect Classification

Jonathan Dunn


Abstract
This paper evaluates global-scale dialect identification for 14 national varieties of English on both web-crawled data and Twitter data. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers.
Anthology ID:
W19-1405
Volume:
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
June
Year:
2019
Address:
Ann Arbor, Michigan
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42–53
Language:
URL:
https://aclanthology.org/W19-1405
DOI:
10.18653/v1/W19-1405
Bibkey:
Cite (ACL):
Jonathan Dunn. 2019. Modeling Global Syntactic Variation in English Using Dialect Classification. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 42–53, Ann Arbor, Michigan. Association for Computational Linguistics.
Cite (Informal):
Modeling Global Syntactic Variation in English Using Dialect Classification (Dunn, VarDial 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-1405.pdf
Software:
 W19-1405.Software.zip