Matthew Mandelkern

2024

pdf bib abs
Do Language Models’ Words Refer?
Matthew Mandelkern | Tal Linzen
Computational Linguistics, Volume 50, Issue 3 - September 2024

What do language models (LMs) do with language? They can produce sequences of (mostly) coherent strings closely resembling English. But do those sentences mean something, or are LMs simply babbling in a convincing simulacrum of language use? We address one aspect of this broad question: whether LMs’ words can refer, that is, achieve “word-to-world” connections. There is prima facie reason to think they do not, since LMs do not interact with the world in the way that ordinary language users do. Drawing on the externalist tradition in philosophy of language, we argue that those appearances are misleading: Even if the inputs to LMs are simply strings of text, they are strings of text with natural histories, and that may suffice for LMs’ words to refer.

pdf bib abs
Conditional and Modal Reasoning in Large Language Models
Wesley H. Holliday | Matthew Mandelkern | Cedegao E. Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The reasoning abilities of large language models (LLMs) are the topic of a growing body of research in AI and cognitive science. In this paper, we probe the extent to which twenty-nine LLMs are able to distinguish logically correct inferences from logically fallacious ones. We focus on inference patterns involving conditionals (e.g., '*If* Ann has a queen, *then* Bob has a jack’) and epistemic modals (e.g., ‘Ann *might* have an ace’, ‘Bob *must* have a king’). These inferences have been of special interest to logicians, philosophers, and linguists, since they play a central role in the fundamental human ability to reason about distal possibilities. Assessing LLMs on these inferences is thus highly relevant to the question of how much the reasoning abilities of LLMs match those of humans. All the LLMs we tested make some basic mistakes with conditionals or modals, though zero-shot chain-of-thought prompting helps them make fewer mistakes. Even the best performing LLMs make basic errors in modal reasoning, display logically inconsistent judgments across inference patterns involving epistemic modals and conditionals, and give answers about complex conditional inferences that do not match reported human judgments. These results highlight gaps in basic logical reasoning in today’s LLMs.

Co-authors

Venues

cl1
emnlp1

Fix author