Inducing Generalizable and Interpretable Lexica

Yilin Geng; Zetian Wu; Roshan Santhosh; Tejas Srivastava; Lyle Ungar; João Sedoc

doi:10.18653/v1/2022.findings-emnlp.325

Inducing Generalizable and Interpretable Lexica

Yilin Geng, Zetian Wu, Roshan Santhosh, Tejas Srivastava, Lyle Ungar, João Sedoc

Abstract

Lexica – words and associated scores – are widely used as simple, interpretable, generalizable language features to predict sentiment, emotions, mental health, and personality. They also provide insight into the psychological features behind those moods and traits. Such lexica, historically created by human experts, are valuable to linguists, psychologists, and social scientists, but they take years of refinement and have limited coverage. In this paper, we investigate how the lexica that provide psycholinguistic insights could be computationally induced and how they should be assessed. We identify generalizability and interpretability as two essential properties of such lexica. We induce lexica using both context-oblivious and context-aware approaches, compare their predictive performance both within the training corpus and across various corpora, and evaluate their quality using crowd-worker assessment. We find that lexica induced from context-oblivious models are more generalizable and interpretable than those from more accurate context-aware transformer models. In addition, lexicon scores can identify explanatory words more reliably than a high performing transformer with feature-importance measures like SHAP.

Anthology ID:: 2022.findings-emnlp.325
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4430–4448
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.325/
DOI:: 10.18653/v1/2022.findings-emnlp.325
Bibkey:
Cite (ACL):: Yilin Geng, Zetian Wu, Roshan Santhosh, Tejas Srivastava, Lyle Ungar, and João Sedoc. 2022. Inducing Generalizable and Interpretable Lexica. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4430–4448, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Inducing Generalizable and Interpretable Lexica (Geng et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-emnlp.325.pdf

PDF Cite Search Fix data