Psycholinguistic Diagnosis of Language Models’ Commonsense Reasoning
Proceedings of the First Workshop on Commonsense Representation and Reasoning (CSRR 2022)
Neural language models have attracted a lot of attention in the past few years. More and more researchers are getting intrigued by how language models encode commonsense, specifically what kind of commonsense they understand, and why they do. This paper analyzed neural language models’ understanding of commonsense pragmatics (i.e., implied meanings) through human behavioral and neurophysiological data. These psycholinguistic tests are designed to draw conclusions based on predictive responses in context, making them very well suited to test word-prediction models such as BERT in natural settings. They can provide the appropriate prompts and tasks to answer questions about linguistic mechanisms underlying predictive responses. This paper adopted psycholinguistic datasets to probe language models’ commonsense reasoning. Findings suggest that GPT-3’s performance was mostly at chance in the psycholinguistic tasks. We also showed that DistillBERT had some understanding of the (implied) intent that’s shared among most people. Such intent is implicitly reflected in the usage of conversational implicatures and presuppositions. Whether or not fine-tuning improved its performance to human-level depends on the type of commonsense reasoning.
Pre-trained Language Models’ Interpretation of Evaluativity Implicature: Evidence from Gradable Adjectives Usage in Context
Proceedings of the Second Workshop on Understanding Implicit and Underspecified Language
By saying Maria is tall, a human speaker typically implies that Maria is evaluatively tall from the speaker’s perspective. However, by using a different construction Maria is taller than Sophie, we cannot infer from Maria and Sophie’s relative heights that Maria is evaluatively tall because it is possible for Maria to be taller than Sophie in a context in which they both count as short. Can pre-trained language models (LMs) “understand” evaulativity (EVAL) inference? To what extent can they discern the EVAL salience of different constructions in a conversation? Will it help LMs’ implicitness performance if we give LMs a persona such as chill, social, and pragmatically skilled? Our study provides an approach to probing LMs’ interpretation of EVAL inference by incorporating insights from experimental pragmatics and sociolinguistics. We find that with the appropriate prompt, LMs can succeed in some pragmatic level language understanding tasks. Our study suggests that socio-pragmatics methodology can shed light on the challenging questions in NLP.
Pragmatic competence of pre-trained language models through the lens of discourse connectives
Proceedings of the 25th Conference on Computational Natural Language Learning
As pre-trained language models (LMs) continue to dominate NLP, it is increasingly important that we understand the depth of language capabilities in these models. In this paper, we target pre-trained LMs’ competence in pragmatics, with a focus on pragmatics relating to discourse connectives. We formulate cloze-style tests using a combination of naturally-occurring data and controlled inputs drawn from psycholinguistics. We focus on testing models’ ability to use pragmatic cues to predict discourse connectives, models’ ability to understand implicatures relating to connectives, and the extent to which models show humanlike preferences regarding temporal dynamics of connectives. We find that although models predict connectives reasonably well in the context of naturally-occurring data, when we control contexts to isolate high-level pragmatic cues, model sensitivity is much lower. Models also do not show substantial humanlike temporal preferences. Overall, the findings suggest that at present, dominant pre-training paradigms do not result in substantial pragmatic competence in our models.
The Invertible Construction in Chinese
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation