Marten van Schijndel

Also published as: Marten Van Schijndel, Martin van Schijndel, Marten Van Schijndel

2025

pdf bib abs
Disentangling language change: sparse autoencoders quantify the semantic evolution of indigeneity in French
Jacob A. Matthews | Laurent Dubreuil | Imane Terhmina | Yunci Sun | Matthew Wilkens | Marten Van Schijndel
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

This study presents a novel approach to analyzing historical language change, focusing on the evolving semantics of the French term “indigène(s)” (“indigenous”) between 1825 and 1950. While existing approaches to measuring semantic change with contextual word embeddings (CWE) rely primarily on similarity measures or clustering, these methods may not be suitable for highly imbalanced datasets, and pose challenges for interpretation. For this reason, we propose an interpretable, feature-level approach to analyzing language change, which we use to trace the semantic evolution of “indigène(s)” over a 125-year period. Following recent work on sequence embeddings (O’Neill et al., 2024), we use k-sparse autoencoders (k-SAE) (Makhzani and Frey, 2013) to interpret over 210,000 CWEs generated using sentences sourced from the French National Library. We demonstrate that k-SAEs can learn interpretable features from CWEs, as well as how differences in feature activations across time periods reveal highly specific aspects of language change. In addition, we show that diachronic change in feature activation frequency reflects the evolution of French colonial legal structures during the 19th and 20th centuries.

2024

pdf bib abs
Semantics or spelling? Probing contextual word embeddings with orthographic noise
Jacob A. Matthews | John R. Starr | Marten van Schijndel
Findings of the Association for Computational Linguistics: ACL 2024

Pretrained language model (PLM) hidden states are frequently employed as contextual word embeddings (CWE): high-dimensional representations that encode semantic information given linguistic context. Across many areas of computational linguistics research, similarity between CWEs is interpreted as semantic similarity. However, it remains unclear exactly what information is encoded in PLM hidden states. We investigate this practice by probing PLM representations using minimal orthographic noise. We expect that if CWEs primarily encode semantic information, a single character swap in the input word will not drastically affect the resulting representation, given sufficient linguistic context. Surprisingly, we find that CWEs generated by popular PLMs are highly sensitive to noise in input data, and that this sensitivity is related to subword tokenization: the fewer tokens used to represent a word at input, the more sensitive its corresponding CWE. This suggests that CWEs capture information unrelated to word-level meaning and can be manipulated through trivial modifications of input data. We conclude that these PLM-derived CWEs may not be reliable semantic proxies, and that caution is warranted when interpreting representational similarity.

2023

pdf bib abs
Linguistic Compression in Single-Sentence Human-Written Summaries
Fangcong Yin | Marten van Schijndel
Findings of the Association for Computational Linguistics: EMNLP 2023

Summarizing texts involves significant cognitive efforts to compress information. While advances in automatic summarization systems have drawn attention from the NLP and linguistics communities to this topic, there is a lack of computational studies of linguistic patterns in human-written summaries. This work presents a large-scale corpus study of human-written single-sentence summaries. We analyzed the linguistic compression patterns from source documents to summaries at different granularities, and we found that summaries are generally written with morphological expansion, increased lexical diversity, and similar positional arrangements of specific words compared to the source across different genres. We also studied how linguistic compressions of different factors affect reader judgments of quality through a human study, with the results showing that the use of morphological and syntactic changes by summary writers matches reader preferences while lexical diversity and word specificity preferences are not aligned between summary writers and readers.

2022

pdf bib abs
Dual Mechanism Priming Effects in Hindi Word Order
Sidharth Ranjan | Marten van Schijndel | Sumeet Agarwal | Rajakrishnan Rajkumar
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Word order choices during sentence production can be primed by preceding sentences. In this work, we test the DUAL MECHANISM hypothesis that priming is driven by multiple different sources. Using a Hindi corpus of text productions, we model lexical priming with an n-gram cache model, and we capture more abstract syntactic priming with an adaptive neural language model. We permute the preverbal constituents of corpus sentences and then use a logistic regression model to predict which sentences actually occurred in the corpus against artificially generated meaning-equivalent variants. Our results indicate that lexical priming and lexically-independent syntactic priming affect complementary sets of verb classes. By showing that different priming influences are separable from one another, our results support the hypothesis that multiple different cognitive mechanisms underlie priming.

pdf bib abs
Discourse Context Predictability Effects in Hindi Word Order
Sidharth Ranjan | Marten van Schijndel | Sumeet Agarwal | Rajakrishnan Rajkumar
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

We test the hypothesis that discourse predictability influences Hindi syntactic choice. While prior work has shown that a number of factors (e.g., information status, dependency length, and syntactic surprisal) influence Hindi word order preferences, the role of discourse predictability is underexplored in the literature. Inspired by prior work on syntactic priming, we investigate how the words and syntactic structures in a sentence influence the word order of the following sentences. Specifically, we extract sentences from the Hindi-Urdu Treebank corpus (HUTB), permute the preverbal constituents of those sentences, and build a classifier to predict which sentences actually occurred in the corpus against artificially generated distractors. The classifier uses a number of discourse-based features and cognitive features to make its predictions, including dependency length, surprisal, and information status. We find that information status and LSTM-based discourse predictability influence word order choices, especially for non-canonical object-fronted orders. We conclude by situating our results within the broader syntactic priming literature.

2021

pdf bib abs
Uncovering Constraint-Based Behavior in Neural Models via Targeted Fine-Tuning
Forrest Davis | Marten van Schijndel
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

A growing body of literature has focused on detailing the linguistic knowledge embedded in large, pretrained language models. Existing work has shown that non-linguistic biases in models can drive model behavior away from linguistic generalizations. We hypothesized that competing linguistic processes within a language, rather than just non-linguistic model biases, could obscure underlying linguistic knowledge. We tested this claim by exploring a single phenomenon in four languages: English, Chinese, Spanish, and Italian. While human behavior has been found to be similar across languages, we find cross-linguistic variation in model behavior. We show that competing processes in a language act as constraints on model behavior and demonstrate that targeted fine-tuning can re-weight the learned constraints, uncovering otherwise dormant linguistic knowledge in models. Our results suggest that models need to learn both the linguistic constraints in a language and their relative ranking, with mismatches in either producing non-human-like behavior.

pdf bib
Analytical, Symbolic and First-Order Reasoning within Neural Architectures
Samuel Ryb | Marten van Schijndel
Proceedings of the ESSLLI 2021 Workshop on Computing Semantics with Types, Frames and Related Structures

pdf bib abs
All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality
William Timkey | Marten van Schijndel
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Similarity measures are a vital tool for understanding how language models represent and process language. Standard representational similarity measures such as cosine similarity and Euclidean distance have been successfully used in static word embedding models to understand how words cluster in semantic space. Recently, these measures have been applied to embeddings from contextualized models such as BERT and GPT-2. In this work, we call into question the informativity of such measures for contextualized language models. We find that a small number of rogue dimensions, often just 1-3, dominate these measures. Moreover, we find a striking mismatch between the dimensions that dominate similarity measures and those which are important to the behavior of the model. We show that simple postprocessing techniques such as standardization are able to correct for rogue dimensions and reveal underlying representational quality. We argue that accounting for rogue dimensions is essential for any similarity-based analysis of contextual language models.

pdf bib
To Point or Not to Point: Understanding How Abstractive Summarizers Paraphrase Text
Matt Wilber | William Timkey | Marten van Schijndel
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib abs
Recurrent Neural Network Language Models Always Learn English-Like Relative Clause Attachment
Forrest Davis | Marten van Schijndel
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

A standard approach to evaluating language models analyzes how models assign probabilities to valid versus invalid syntactic constructions (i.e. is a grammatical sentence more probable than an ungrammatical sentence). Our work uses ambiguous relative clause attachment to extend such evaluations to cases of multiple simultaneous valid interpretations, where stark grammaticality differences are absent. We compare model performance in English and Spanish to show that non-linguistic biases in RNN LMs advantageously overlap with syntactic structure in English but not Spanish. Thus, English models may appear to acquire human-like syntactic preferences, while models trained on Spanish fail to acquire comparable human-like preferences. We conclude by relating these results to broader concerns about the relationship between comprehension (i.e. typical language model use cases) and production (which generates the training data for language models), suggesting that necessary linguistic biases are not present in the training signal at all.

pdf bib abs
Discourse structure interacts with reference but not syntax in neural language models
Forrest Davis | Marten van Schijndel
Proceedings of the 24th Conference on Computational Natural Language Learning

Language models (LMs) trained on large quantities of text have been claimed to acquire abstract linguistic representations. Our work tests the robustness of these abstractions by focusing on the ability of LMs to learn interactions between different linguistic representations. In particular, we utilized stimuli from psycholinguistic studies showing that humans can condition reference (i.e. coreference resolution) and syntactic processing on the same discourse structure (implicit causality). We compared both transformer and long short-term memory LMs to find that, contrary to humans, implicit causality only influences LM behavior for reference, not syntax, despite model representations that encode the necessary discourse information. Our results further suggest that LM behavior can contradict not only learned representations of discourse but also syntactic agreement, pointing to shortcomings of standard language modeling.

pdf bib abs
Filler-gaps that neural networks fail to generalize
Debasmita Bhattacharya | Marten van Schijndel
Proceedings of the 24th Conference on Computational Natural Language Learning

It can be difficult to separate abstract linguistic knowledge in recurrent neural networks (RNNs) from surface heuristics. In this work, we probe for highly abstract syntactic constraints that have been claimed to govern the behavior of filler-gap dependencies across different surface constructions. For models to generalize abstract patterns in expected ways to unseen data, they must share representational features in predictable ways. We use cumulative priming to test for representational overlap between disparate filler-gap constructions in English and find evidence that the models learn a general representation for the existence of filler-gap dependencies. However, we find no evidence that the models learn any of the shared underlying grammatical constraints we tested. Our work raises questions about the degree to which RNN language models learn abstract linguistic representations.

2019

pdf bib abs
Quantity doesn’t buy quality syntax with neural language models
Marten van Schijndel | Aaron Mueller | Tal Linzen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Recurrent neural networks can learn to predict upcoming words remarkably well on average; in syntactically complex contexts, however, they often assign unexpectedly high probabilities to ungrammatical words. We investigate to what extent these shortcomings can be mitigated by increasing the size of the network and the corpus on which it is trained. We find that gains from increasing network size are minimal beyond a certain point. Likewise, expanding the training corpus yields diminishing returns; we estimate that the training corpus would need to be unrealistically large for the models to match human performance. A comparison to GPT and BERT, Transformer-based models trained on billions of words, reveals that these models perform even more poorly than our LSTMs in some constructions. Our results make the case for more data efficient architectures.

pdf bib abs
Using Priming to Uncover the Organization of Syntactic Representations in Neural Language Models
Grusha Prasad | Marten van Schijndel | Tal Linzen
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Neural language models (LMs) perform well on tasks that require sensitivity to syntactic structure. Drawing on the syntactic priming paradigm from psycholinguistics, we propose a novel technique to analyze the representations that enable such success. By establishing a gradient similarity metric between structures, this technique allows us to reconstruct the organization of the LMs’ syntactic representational space. We use this technique to demonstrate that LSTM LMs’ representations of different types of sentences with relative clauses are organized hierarchically in a linguistically interpretable manner, suggesting that the LMs track abstract properties of the sentence.

pdf bib
Can Entropy Explain Successor Surprisal Effects in Reading?
Marten van Schijndel | Tal Linzen
Proceedings of the Society for Computation in Linguistics (SCiL) 2019

2018

pdf bib abs
A Neural Model of Adaptation in Reading
Marten van Schijndel | Tal Linzen
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

It has been argued that humans rapidly adapt their lexical and syntactic expectations to match the statistics of the current linguistic context. We provide further support to this claim by showing that the addition of a simple adaptation mechanism to a neural language model improves our predictions of human reading times compared to a non-adaptive model. We analyze the performance of the model on controlled materials from psycholinguistic experiments and show that it adapts not only to lexical items but also to abstract syntactic structures.

pdf bib
Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018)
Asad Sayeed | Cassandra Jacobs | Tal Linzen | Marten van Schijndel
Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018)

2017

pdf bib
Proceedings of the 7th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2017)
Ted Gibson | Tal Linzen | Asad Sayeed | Marten van Schijndel | William Schuler
Proceedings of the 7th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2017)

2016

pdf bib abs
Addressing surprisal deficiencies in reading time models
Marten van Schijndel | William Schuler
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

This study demonstrates a weakness in how n-gram and PCFG surprisal are used to predict reading times in eye-tracking data. In particular, the information conveyed by words skipped during saccades is not usually included in the surprisal measures. This study shows that correcting the surprisal calculation improves n-gram surprisal and that upcoming n-grams affect reading times, replicating previous findings of how lexical frequencies affect reading times. In contrast, the predictivity of PCFG surprisal does not benefit from the surprisal correction despite the fact that lexical sequences skipped by saccades are processed by readers, as demonstrated by the corrected n-gram measure. These results raise questions about the formulation of information-theoretic measures of syntactic processing such as PCFG surprisal and entropy reduction when applied to reading times.

pdf bib abs
Memory access during incremental sentence processing causes reading time latency
Cory Shain | Marten van Schijndel | Richard Futrell | Edward Gibson | William Schuler
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

Studies on the role of memory as a predictor of reading time latencies (1) differ in their predictions about when memory effects should occur in processing and (2) have had mixed results, with strong positive effects emerging from isolated constructed stimuli and weak or even negative effects emerging from naturally-occurring stimuli. Our study addresses these concerns by comparing several implementations of prominent sentence processing theories on an exploratory corpus and evaluating the most successful of these on a confirmatory corpus, using a new self-paced reading corpus of seemingly natural narratives constructed to contain an unusually high proportion of memory-intensive constructions. We show highly significant and complementary broad-coverage latency effects both for predictors based on the Dependency Locality Theory and for predictors based on a left-corner parsing model of sentence processing. Our results indicate that memory access during sentence processing does take time, but suggest that stimuli requiring many memory access events may be necessary in order to observe the effect.

Co-authors

Venues