2024
pdf
bib
abs
Hierarchical syntactic structure in human-like language models
Michael Wolfman
|
Donald Dunagan
|
Jonathan Brennan
|
John Hale
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Language models (LMs) are a meeting point for cognitive modeling and computational linguistics. How should they be designed to serve as adequate cognitive models? To address this question, this study contrasts two Transformer-based LMs that share the same architecture. Only one of them analyzes sentences in terms of explicit hierarchical structure. Evaluating the two LMs against fMRI time series via the surprisal complexity metric, the results implicate the superior temporal gyrus. These findings underline the need for hierarchical sentence structures in word-by-word models of human language comprehension.
2021
pdf
bib
abs
Modeling Incremental Language Comprehension in the Brain with Combinatory Categorial Grammar
Miloš Stanojević
|
Shohini Bhattasali
|
Donald Dunagan
|
Luca Campanelli
|
Mark Steedman
|
Jonathan Brennan
|
John Hale
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Hierarchical sentence structure plays a role in word-by-word human sentence comprehension, but it remains unclear how best to characterize this structure and unknown how exactly it would be recognized in a step-by-step process model. With a view towards sharpening this picture, we model the time course of hemodynamic activity within the brain during an extended episode of naturalistic language comprehension using Combinatory Categorial Grammar (CCG). CCG has well-defined incremental parsing algorithms, surface compositional semantics, and can explain long-range dependencies as well as complicated cases of coordination. We find that CCG-derived predictors improve a regression model of fMRI time course in six language-relevant brain regions, over and above predictors derived from context-free phrase structure. Adding a special Revealing operator to CCG parsing, one designed to handle right-adjunction, improves the fit in three of these regions. This evidence for CCG from neuroimaging bolsters the more general case for mildly context-sensitive grammars in the cognitive science of language.
2020
pdf
bib
abs
The Alice Datasets: fMRI & EEG Observations of Natural Language Comprehension
Shohini Bhattasali
|
Jonathan Brennan
|
Wen-Ming Luh
|
Berta Franzluebbers
|
John Hale
Proceedings of the Twelfth Language Resources and Evaluation Conference
The Alice Datasets are a set of datasets based on magnetic resonance data and electrophysiological data, collected while participants heard a story in English. Along with the datasets and the text of the story, we provide a variety of different linguistic and computational measures ranging from prosodic predictors to predictors capturing hierarchical syntactic information. These ecologically valid datasets can be easily reused to replicate prior work and to test new hypotheses about natural language comprehension in the brain.
pdf
bib
abs
The Little Prince in 26 Languages: Towards a Multilingual Neuro-Cognitive Corpus
Sabrina Stehwien
|
Lena Henke
|
John Hale
|
Jonathan Brennan
|
Lars Meyer
Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources
We present the Le Petit Prince Corpus (LPPC), a multi-lingual resource for research in (computational) psycho- and neurolinguistics. The corpus consists of the children’s story The Little Prince in 26 languages. The dataset is in the process of being built using state-of-the-art methods for speech and language processing and electroencephalography (EEG). The planned release of LPPC dataset will include raw text annotated with dependency graphs in the Universal Dependencies standard, a near-natural-sounding synthetic spoken subset as well as EEG recordings. We will use this corpus for conducting neurolinguistic studies that generalize across a wide range of languages, overcoming typological constraints to traditional approaches. The planned release of the LPPC combines linguistic and EEG data for many languages using fully automatic methods, and thus constitutes a readily extendable resource that supports cross-linguistic and cross-disciplinary research.
2019
pdf
bib
abs
Text Genre and Training Data Size in Human-like Parsing
John Hale
|
Adhiguna Kuncoro
|
Keith Hall
|
Chris Dyer
|
Jonathan Brennan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Domain-specific training typically makes NLP systems work better. We show that this extends to cognitive modeling as well by relating the states of a neural phrase-structure parser to electrophysiological measures from human participants. These measures were recorded as participants listened to a spoken recitation of the same literary text that was supplied as input to the neural parser. Given more training data, the system derives a better cognitive model — but only when the training examples come from the same textual genre. This finding is consistent with the idea that humans adapt syntactic expectations to particular genres during language comprehension (Kaan and Chun, 2018; Branigan and Pickering, 2017).
2018
pdf
bib
abs
Finding syntax in human encephalography with beam search
John Hale
|
Chris Dyer
|
Adhiguna Kuncoro
|
Jonathan Brennan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recurrent neural network grammars (RNNGs) are generative models of (tree , string ) pairs that rely on neural networks to evaluate derivational choices. Parsing with them using beam search yields a variety of incremental complexity metrics such as word surprisal and parser action count. When used as regressors against human electrophysiological responses to naturalistic text, they derive two amplitude effects: an early peak and a P600-like later peak. By contrast, a non-syntactic neural language model yields no reliable effects. Model comparisons attribute the early peak to syntactic composition within the RNNG. This pattern of results recommends the RNNG+beam search combination as a mechanistic model of the syntactic processing that occurs during normal human language comprehension.
pdf
bib
Differentiating Phrase Structure Parsing and Memory Retrieval in the Brain
Shohini Bhattasali
|
John Hale
|
Christophe Pallier
|
Jonathan Brennan
|
Wen-Ming Luh
|
R. Nathan Spreng
Proceedings of the Society for Computation in Linguistics (SCiL) 2018
2016
pdf
bib
abs
Temporal Lobes as Combinatory Engines for both Form and Meaning
Jixing Li
|
Jonathan Brennan
|
Adam Mahar
|
John Hale
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
The relative contributions of meaning and form to sentence processing remains an outstanding issue across the language sciences. We examine this issue by formalizing four incremental complexity metrics and comparing them against freely-available ROI timecourses. Syntax-related metrics based on top-down parsing and structural dependency-distance turn out to significantly improve a regression model, compared to a simpler model that formalizes only conceptual combination using a distributional vector-space model. This confirms the view of the anterior temporal lobes as combinatory engines that deal in both form (see e.g. Brennan et al., 2012; Mazoyer, 1993) and meaning (see e.g., Patterson et al., 2007). This same characterization applies to a posterior temporal region in roughly “Wernicke’s Area.”
2015
pdf
bib
Modeling fMRI time courses with linguistic structure at various grain sizes
John Hale
|
David Lutz
|
Wen-Ming Luh
|
Jonathan Brennan
Proceedings of the 6th Workshop on Cognitive Modeling and Computational Linguistics