Forrest Davis

2025

Discourse Sensitivity in Attraction Effects: The Interplay Between Language Model Size and Training Data
Sanghee Kim | Forrest Davis
Proceedings of the Society for Computation in Linguistics 2025

2024

pdf bib abs

Training an NLP Scholar at a Small Liberal Arts College: A Backwards Designed Course Proposal
Grusha Prasad | Forrest Davis
Proceedings of the Sixth Workshop on Teaching NLP

The rapid growth in natural language processing (NLP) over the last couple yearshas generated student interest and excitement in learning more about the field. In this paper, we present two types of students that NLP courses might want to train. First, an “NLP engineer” who is able to flexibly design, build and apply new technologies in NLP for a wide range of tasks. Second, an “NLP scholar” who is able to pose, refine and answer questions in NLP and how it relates to the society, while also learning to effectively communicate these answers to a broader audience. While these two types of skills are not mutually exclusive — NLP engineers should be able to think critically, and NLP scholars should be able to build systems — we think that courses can differ in the balance of these skills. As educators at Small Liberal Arts Colleges, the strengths of our students and our institution favors an approach that is better suited to train NLP scholars. In this paper we articulate what kinds of skills an NLP scholar should have, and then adopt a backwards design to propose course components that can aid the acquisition of these skills.

2023

pdf bib abs

Can Language Models Be Tricked by Language Illusions? Easier with Syntax, Harder with Semantics
Yuhan Zhang | Edward Gibson | Forrest Davis
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)

Language models (LMs) have been argued to overlap substantially with human beings in grammaticality judgment tasks. But when humans systematically make errors in language processing, should we expect LMs to behave like cognitive models of language and mimic human behavior? We answer this question by investigating LMs’ more subtle judgments associated with “language illusions” – sentences that are vague in meaning, implausible, or ungrammatical but receive unexpectedly high acceptability judgments by humans. We looked at three illusions: the comparative illusion (e.g. “More people have been to Russia than I have”), the depth-charge illusion (e.g. “No head injury is too trivial to be ignored”), and the negative polarity item (NPI) illusion (e.g. “The hunter who no villager believed to be trustworthy will ever shoot a bear”). We found that probabilities represented by LMs were more likely to align with human judgments of being “tricked” by the NPI illusion which examines a structural dependency, compared to the comparative and the depth-charge illusions which require sophisticated semantic understanding. No single LM or metric yielded results that are entirely consistent with human behavior. Ultimately, we show that LMs are limited both in their construal as cognitive models of human language processing and in their capacity to recognize nuanced but critical information in complicated language materials.

2022

pdf bib abs

Incremental Processing of Principle B: Mismatches Between Neural Models and Humans
Forrest Davis
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)

Despite neural language models qualitatively capturing many human linguistic behaviors, recent work has demonstrated that they underestimate the true processing costs of ungrammatical structures. We extend these more fine-grained comparisons between humans and models by investigating the interaction between Principle B and coreference processing. While humans use Principle B to block certain structural positions from affecting their incremental processing, we find that GPT-based language models are influenced by ungrammatical positions. We conclude by relating the mismatch between neural models and humans to properties of training data and suggest that certain aspects of human processing behavior do not directly follow from linguistic data.

2021

pdf bib abs

Uncovering Constraint-Based Behavior in Neural Models via Targeted Fine-Tuning
Forrest Davis | Marten van Schijndel
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

A growing body of literature has focused on detailing the linguistic knowledge embedded in large, pretrained language models. Existing work has shown that non-linguistic biases in models can drive model behavior away from linguistic generalizations. We hypothesized that competing linguistic processes within a language, rather than just non-linguistic model biases, could obscure underlying linguistic knowledge. We tested this claim by exploring a single phenomenon in four languages: English, Chinese, Spanish, and Italian. While human behavior has been found to be similar across languages, we find cross-linguistic variation in model behavior. We show that competing processes in a language act as constraints on model behavior and demonstrate that targeted fine-tuning can re-weight the learned constraints, uncovering otherwise dormant linguistic knowledge in models. Our results suggest that models need to learn both the linguistic constraints in a language and their relative ranking, with mismatches in either producing non-human-like behavior.

2020

pdf bib abs

Recurrent Neural Network Language Models Always Learn English-Like Relative Clause Attachment
Forrest Davis | Marten van Schijndel
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

A standard approach to evaluating language models analyzes how models assign probabilities to valid versus invalid syntactic constructions (i.e. is a grammatical sentence more probable than an ungrammatical sentence). Our work uses ambiguous relative clause attachment to extend such evaluations to cases of multiple simultaneous valid interpretations, where stark grammaticality differences are absent. We compare model performance in English and Spanish to show that non-linguistic biases in RNN LMs advantageously overlap with syntactic structure in English but not Spanish. Thus, English models may appear to acquire human-like syntactic preferences, while models trained on Spanish fail to acquire comparable human-like preferences. We conclude by relating these results to broader concerns about the relationship between comprehension (i.e. typical language model use cases) and production (which generates the training data for language models), suggesting that necessary linguistic biases are not present in the training signal at all.

pdf bib abs

Discourse structure interacts with reference but not syntax in neural language models
Forrest Davis | Marten van Schijndel
Proceedings of the 24th Conference on Computational Natural Language Learning

Language models (LMs) trained on large quantities of text have been claimed to acquire abstract linguistic representations. Our work tests the robustness of these abstractions by focusing on the ability of LMs to learn interactions between different linguistic representations. In particular, we utilized stimuli from psycholinguistic studies showing that humans can condition reference (i.e. coreference resolution) and syntactic processing on the same discourse structure (implicit causality). We compared both transformer and long short-term memory LMs to find that, contrary to humans, implicit causality only influences LM behavior for reference, not syntax, despite model representations that encode the necessary discourse information. Our results further suggest that LM behavior can contradict not only learned representations of discourse but also syntactic agreement, pointing to shortcomings of standard language modeling.

Forrest Davis

2025

2024

2023

2022

2021

2020

2017

Co-authors

Venues