Conditional Language Models for Community-Level Linguistic Variation

Bill Noble, Jean-philippe Bernardy


Abstract
Community-level linguistic variation is a core concept in sociolinguistics. In this paper, we use conditioned neural language models to learn vector representations for 510 online communities. We use these representations to measure linguistic variation between commu-nities and investigate the degree to which linguistic variation corresponds with social connections between communities. We find that our sociolinguistic embeddings are highly correlated with a social network-based representation that does not use any linguistic input.
Anthology ID:
2022.nlpcss-1.9
Volume:
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)
Month:
November
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
David Bamman, Dirk Hovy, David Jurgens, Katherine Keith, Brendan O'Connor, Svitlana Volkova
Venue:
NLP+CSS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
59–78
Language:
URL:
https://aclanthology.org/2022.nlpcss-1.9
DOI:
10.18653/v1/2022.nlpcss-1.9
Bibkey:
Cite (ACL):
Bill Noble and Jean-philippe Bernardy. 2022. Conditional Language Models for Community-Level Linguistic Variation. In Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS), pages 59–78, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Conditional Language Models for Community-Level Linguistic Variation (Noble & Bernardy, NLP+CSS 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nlpcss-1.9.pdf
Video:
 https://aclanthology.org/2022.nlpcss-1.9.mp4