dialectR: Doing Dialectometry in R

Ryan Soh-Eun Shim, John Nerbonne


Abstract
We present dialectR, an open-source R package for performing quantitative analyses of dialects based on categorical measures of difference and on variants of edit distance. dialectR stands as one of the first programmable toolkits that may freely be combined and extended by users with further statistical procedures. We describe implementational details of the package, and provide two examples of its use: one performing analyses based on multidimensional scaling and hierarchical clustering on a dataset of Dutch dialects, and another showing how an approximation of the acoustic vowel space may be achieved by performing an MFCC (Mel-Frequency Cepstral Coefficients)-based acoustic distance on audio recordings of vowels.
Anthology ID:
2022.vardial-1.3
Volume:
Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jörg Tiedemann, Marcos Zampieri
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20–27
Language:
URL:
https://aclanthology.org/2022.vardial-1.3
DOI:
Bibkey:
Cite (ACL):
Ryan Soh-Eun Shim and John Nerbonne. 2022. dialectR: Doing Dialectometry in R. In Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 20–27, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
dialectR: Doing Dialectometry in R (Shim & Nerbonne, VarDial 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.vardial-1.3.pdf