Sarah Payne

2025

Lemmas Matter, But Not Like That: Predictors of Lemma-Based Generalization in Morphological Inflection
Sarah Payne | Jordan Kodner
Findings of the Association for Computational Linguistics: ACL 2025

Recent work has suggested that overlap –whether a given lemma or feature set is attested independently in train – drives model performance on morphological inflection tasks. The impact of lemma overlap, however, is debated, with recent work reporting accuracy drops from 0% to 30% between seen and unseen test lemmas. In this paper, we introduce a novel splitting algorithm designed to investigate predictors of accuracy on seen and unseen lemmas. We find only an 11% average drop from seen to unseen test lemmas, but show that the number of lemmas in train has a much stronger effect on accuracy on unseen than seen lemmas. We also show that the previously reported 30% drop is inflated due to the introduction of a near-30% drop in the number of training lemmas from the original splits to their novel splits.

2024

pdf bib

A Generalized Algorithm for Learning Positive and Negative Grammars with Unconventional String Models
Sarah Payne
Proceedings of the Society for Computation in Linguistics 2024

2023

pdf bib abs

A Cautious Generalization Goes a Long Way: Learning Morphophonological Rules
Salam Khalifa | Sarah Payne | Jordan Kodner | Ellen Broselow | Owen Rambow
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Explicit linguistic knowledge, encoded by resources such as rule-based morphological analyzers, continues to prove useful in downstream NLP tasks, especially for low-resource languages and dialects. Rules are an important asset in descriptive linguistic grammars. However, creating such resources is usually expensive and non-trivial, especially for spoken varieties with no written standard. In this work, we present a novel approach for automatically learning morphophonological rules of Arabic from a corpus. Motivated by classic cognitive models for rule learning, rules are generalized cautiously. Rules that are memorized for individual items are only allowed to generalize to unseen forms if they are sufficiently reliable in the training data. The learned rules are further examined to ensure that they capture true linguistic phenomena described by domain experts. We also investigate the learnability of rules in low-resource settings across different experimental setups and dialects.

pdf bib abs

Morphological Inflection: A Reality Check
Jordan Kodner | Sarah Payne | Salam Khalifa | Zoey Liu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Morphological inflection is a popular task in sub-word NLP with both practical and cognitive applications. For years now, state-of-the-art systems have reported high, but also highly variable, performance across data sets and languages. We investigate the causes of this high performance and high variability; we find several aspects of data set creation and evaluation which systematically inflate performance and obfuscate differences between languages. To improve generalizability and reliability of results, we propose new data sampling and evaluation strategies that better reflect likely use-cases. Using these new strategies, we make new observations on the generalization abilities of current inflection systems.

pdf bib abs

Exploring Linguistic Probes for Morphological Generalization
Jordan Kodner | Salam Khalifa | Sarah Payne
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Modern work on the cross-linguistic computational modeling of morphological inflection has typically employed language-independent data splitting algorithms. In this paper, we supplement that approach with language-specific probes designed to test aspects of morphological generalization. Testing these probes on three morphologically distinct languages, English, Spanish, and Swahili, we find evidence that three leading morphological inflection systems employ distinct generalization strategies over conjugational classes and feature sets on both orthographic and phonologically transcribed inputs.

Sarah Payne

2025

2024

2023

2021

Co-authors

Venues