Jordan Kodner


2022

pdf bib
Modeling the Relationship between Input Distributions and Learning Trajectories with the Tolerance Principle
Jordan Kodner
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Child language learners develop with remarkable uniformity, both in their learning trajectories and ultimate outcomes, despite major differences in their learning environments. In this paper, we explore the role that the frequencies and distributions of irregular lexical items in the input plays in driving learning trajectories. We conclude that while the Tolerance Principle, a type-based model of productivity learning, accounts for inter-learner uniformity, it also interacts with input distributions to drive cross-linguistic variation in learning trajectories.

pdf bib
Language Acquisition, Neutral Change, and Diachronic Trends in Noun Classifiers
Aniket Kali | Jordan Kodner
Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change

Languages around the world employ classifier systems as a method of semantic organization and categorization. These systems are rife with variability, violability, and ambiguity, and are prone to constant change over time. We explicitly model change in classifier systems as the population-level outcome of child language acquisition over time in order to shed light on the factors that drive change to classifier systems. Our research consists of two parts: a contrastive corpus study of Cantonese and Mandarin child-directed speech to determine the role that ambiguity and homophony avoidance may play in classifier learning and change followed by a series of population-level learning simulations of an abstract classifier system. We find that acquisition without reference to ambiguity avoidance is sufficient to drive broad trends in classifier change and suggest an additional role for adults and discourse factors in classifier death.

pdf bib
SIGMORPHONUniMorph 2022 Shared Task 0: Modeling Inflection in Language Acquisition
Jordan Kodner | Salam Khalifa
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

This year’s iteration of the SIGMORPHONUniMorph shared task on “human-like” morphological inflection generation focuses on generalization and errors in language acquisition. Systems are trained on data sets extracted from corpora of child-directed speech in order to simulate a natural learning setting, and their predictions are evaluated against what is known about children’s developmental trajectories for three well-studied patterns: English past tense, German noun plurals, and Arabic noun plurals. Three submitted neural systems were evaluated together with two baselines. Performance was generally good, and all systems were prone to human-like over-regularization. However, all systems were also prone to non-human-like over-irregularization and nonsense productions to varying degrees. We situate this behavior in a discussion of the Past Tense Debate.

pdf bib
SIGMORPHONUniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inflection
Jordan Kodner | Salam Khalifa | Khuyagbaatar Batsuren | Hossep Dolatian | Ryan Cotterell | Faruk Akkus | Antonios Anastasopoulos | Taras Andrushko | Aryaman Arora | Nona Atanalov | Gábor Bella | Elena Budianskaya | Yustinus Ghanggo Ate | Omer Goldman | David Guriel | Simon Guriel | Silvia Guriel-Agiashvili | Witold Kieraś | Andrew Krizhanovsky | Natalia Krizhanovsky | Igor Marchenko | Magdalena Markowska | Polina Mashkovtseva | Maria Nepomniashchaya | Daria Rodionova | Karina Scheifer | Alexandra Sorova | Anastasia Yemelina | Jeremiah Young | Ekaterina Vylomova
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

The 2022 SIGMORPHON–UniMorph shared task on large scale morphological inflection generation included a wide range of typologically diverse languages: 33 languages from 11 top-level language families: Arabic (Modern Standard), Assamese, Braj, Chukchi, Eastern Armenian, Evenki, Georgian, Gothic, Gujarati, Hebrew, Hungarian, Itelmen, Karelian, Kazakh, Ket, Khalkha Mongolian, Kholosi, Korean, Lamahalot, Low German, Ludic, Magahi, Middle Low German, Old English, Old High German, Old Norse, Polish, Pomak, Slovak, Turkish, Upper Sorbian, Veps, and Xibe. We emphasize generalization along different dimensions this year by evaluating test items with unseen lemmas and unseen features separately under small and large training conditions. Across the five submitted systems and two baselines, the prediction of inflections with unseen features proved challenging, with average performance decreased substantially from last year. This was true even for languages for which the forms were in principle predictable, which suggests that further work is needed in designing systems that capture the various types of generalization required for the world’s languages.

2021

pdf bib
Learning Morphological Productivity as Meaning-Form Mappings
Sarah Payne | Jordan Kodner | Charles Yang
Proceedings of the Society for Computation in Linguistics 2021

pdf bib
Apparent Communicative Efficiency in the Lexicon is Emergent
Spencer Caplan | Jordan Kodner | Charles Yang
Proceedings of the Society for Computation in Linguistics 2021

2020

pdf bib
Overestimation of Syntactic Representation in Neural Language Models
Jordan Kodner | Nitish Gupta
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

With the advent of powerful neural language models over the last few years, research attention has increasingly focused on what aspects of language they represent that make them so successful. Several testing methodologies have been developed to probe models’ syntactic representations. One popular method for determining a model’s ability to induce syntactic structure trains a model on strings generated according to a template then tests the model’s ability to distinguish such strings from superficially similar ones with different syntax. We illustrate a fundamental problem with this approach by reproducing positive results from a recent paper with two non-syntactic baseline language models: an n-gram model and an LSTM model trained on scrambled inputs.

pdf bib
Modeling Morphological Typology for Unsupervised Learning of Language Morphology
Hongzhi Xu | Jordan Kodner | Mitchell Marcus | Charles Yang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper describes a language-independent model for fully unsupervised morphological analysis that exploits a universal framework leveraging morphological typology. By modeling morphological processes including suffixation, prefixation, infixation, and full and partial reduplication with constrained stem change rules, our system effectively constrains the search space and offers a wide coverage in terms of morphological typology. The system is tested on nine typologically and genetically diverse languages, and shows superior performance over leading systems. We also investigate the effect of an oracle that provides only a handful of bits per language to signal morphological type.

pdf bib
Morphological Segmentation for Low Resource Languages
Justin Mott | Ann Bies | Stephanie Strassel | Jordan Kodner | Caitlin Richter | Hongzhi Xu | Mitchell Marcus
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper describes a new morphology resource created by Linguistic Data Consortium and the University of Pennsylvania for the DARPA LORELEI Program. The data consists of approximately 2000 tokens annotated for morphological segmentation in each of 9 low resource languages, along with root information for 7 of the languages. The languages annotated show a broad diversity of typological features. A minimal annotation scheme for segmentation was developed such that it could capture the patterns of a wide range of languages and also be performed reliably by non-linguist annotators. The basic annotation guidelines were designed to be language-independent, but included language-specific morphological paradigms and other specifications. The resulting annotated corpus is designed to support and stimulate the development of unsupervised morphological segmenters and analyzers by providing a gold standard for their evaluation on a more typologically diverse set of languages than has previously been available. By providing root annotation, this corpus is also a step toward supporting research in identifying richer morphological structures than simple morpheme boundaries.

2018

pdf bib
Syntactic Category Learning as Iterative Prototype-Driven Clustering
Jordan Kodner
Proceedings of the Society for Computation in Linguistics (SCiL) 2018

pdf bib
Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages
Shyam Upadhyay | Jordan Kodner | Dan Roth
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Generating the English transliteration of a name written in a foreign script is an important and challenging step in multilingual knowledge acquisition and information extraction. Existing approaches to transliteration generation require a large (>5000) number of training examples. This difficulty contrasts with transliteration discovery, a somewhat easier task that involves picking a plausible transliteration from a given list. In this work, we present a bootstrapping algorithm that uses constrained discovery to improve generation, and can be used with as few as 500 training examples, which we show can be sourced from annotators in a matter of hours. This opens the task to languages for which large number of training examples are unavailable. We evaluate transliteration generation performance itself, as well the improvement it brings to cross-lingual candidate generation for entity linking, a typical downstream task. We present a comprehensive evaluation of our approach on nine languages, each written in a unique script.

pdf bib
A Framework for Representing Language Acquisition in a Population Setting
Jordan Kodner | Christopher Cerezo Falco
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Language variation and change are driven both by individuals’ internal cognitive processes and by the social structures through which language propagates. A wide range of computational frameworks have been proposed to connect these drivers. We compare the strengths and weaknesses of existing approaches and propose a new analytic framework which combines previous network models’ ability to capture realistic social structure with practically and more elegant computational properties. The framework privileges the process of language acquisition and embeds learners in a social network but is modular so that population structure can be combined with different acquisition models. We demonstrate two applications for the framework: a test of practical concerns that arise when modeling acquisition in a population setting and an application of the framework to recent work on phonological mergers in progress.

2017

pdf bib
Case Studies in the Automatic Characterization of Grammars from Small Wordlists
Jordan Kodner | Spencer Caplan | Hongzhi Xu | Mitchell P. Marcus | Charles Yang
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages