LMSOC: An Approach for Socially Sensitive Pretraining

Vivek Kulkarni; Shubhanshu Mishra; Aria Haghighi

doi:10.18653/v1/2021.findings-emnlp.254

LMSOC: An Approach for Socially Sensitive Pretraining

Vivek Kulkarni, Shubhanshu Mishra, Aria Haghighi

Abstract

While large-scale pretrained language models have been shown to learn effective linguistic representations for many NLP tasks, there remain many real-world contextual aspects of language that current approaches do not capture. For instance, consider a cloze test “I enjoyed the _____ game this weekend”: the correct answer depends heavily on where the speaker is from, when the utterance occurred, and the speaker’s broader social milieu and preferences. Although language depends heavily on the geographical, temporal, and other social contexts of the speaker, these elements have not been incorporated into modern transformer-based language models. We propose a simple but effective approach to incorporate speaker social context into the learned representations of large-scale language models. Our method first learns dense representations of social contexts using graph representation learning algorithms and then primes language model pretraining with these social context representations. We evaluate our approach on geographically-sensitive language modeling tasks and show a substantial improvement (more than 100% relative lift on MRR) compared to baselines.

Anthology ID:: 2021.findings-emnlp.254
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2967–2975
Language:
URL:: https://aclanthology.org/2021.findings-emnlp.254
DOI:: 10.18653/v1/2021.findings-emnlp.254
Bibkey:
Cite (ACL):: Vivek Kulkarni, Shubhanshu Mishra, and Aria Haghighi. 2021. LMSOC: An Approach for Socially Sensitive Pretraining. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2967–2975, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: LMSOC: An Approach for Socially Sensitive Pretraining (Kulkarni et al., Findings 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.findings-emnlp.254.pdf
Video:: https://aclanthology.org/2021.findings-emnlp.254.mp4
Code: twitter-research/lmsoc

PDF Cite Search Code Video