Kyle Gorman - ACL Anthology

Kyle Gorman

2025

Don’t Touch My Diacritics
Kyle Gorman | Yuval Pinter
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

The common practice of preprocessing text before feeding it into NLP models introduces many decision points which have unintended consequences on model performance. In this opinion piece, we focus on the handling of diacritics in texts originating in many languages and scripts. We demonstrate, through several case studies, the adverse effects of inconsistent encoding of diacritized characters and of removing diacritics altogether. We call on the community to adopt simple but necessary steps across all models and toolkits in order to improve handling of diacritized text and, by extension, increase equity in multilingual NLP.

2024

Quantifying the Hyperparameter Sensitivity of Neural Networks for Character-level Sequence-to-Sequence Tasks
Adam Wiemerslage | Kyle Gorman | Katharina von der Wense
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Hyperparameter tuning, the process of searching for suitable hyperparameters, becomes more difficult as the computing resources required to train neural networks continue to grow. This topic continues to receive little attention and discussion—much of it hearsay—despite its obvious importance. We attempt to formalize hyperparameter sensitivity using two metrics: similarity-based sensitivity and performance-based sensitivity. We then use these metrics to quantify two such claims: (1) transformers are more sensitive to hyperparameter choices than LSTMs and (2) transformers are particularly sensitive to batch size. We conduct experiments on two different character-level sequence-to-sequence tasks and find that, indeed, the transformer is slightly more sensitive to hyperparameters according to both of our metrics. However, we do not find that it is more sensitive to batch size in particular.

A* shortest string decoding for non-idempotent semirings
Kyle Gorman | Cyril Allauzen
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Abstract: The single shortest path algorithm is undefined for weighted finite-state automata over non-idempotent semirings because such semirings do not guarantee the existence of a shortest path. However, in non-idempotent semirings admitting an order satisfying a monotonicity condition (such as the plus-times or log semirings), the shortest string is well-defined. We describe an algorithm which finds the shortest string for a weighted non-deterministic automaton over such semirings using the backwards shortest distance of an equivalent deterministic automaton (DFA) as a heuristic for A* search performed over a companion idempotent semiring, which is proven to return the shortest string. There may be exponentially more states in the DFA, but the proposed algorithm needs to visit only a small fraction of them if determinization is performed “on the fly”.

Abbreviation Across the World’s Languages and Scripts
Kyle Gorman | Brian Roark
Proceedings of the Second Workshop on Computation and Written Language (CAWL) @ LREC-COLING 2024

Detailed taxonomies for non-standard words, including abbreviations, have been developed for speech and language processing, though mostly with reference to English. In this paper, we examine abbreviation formation strategies in a diverse sample of more than 50 languages, dialects and scripts. The resulting taxonomy—and data about which strategies are attested in which languages—provides key information needed to create multilingual systems for abbreviation expansion, an essential component for speech processing and text understanding

Proceedings of the Second Workshop on Computation and Written Language (CAWL) @ LREC-COLING 2024
Kyle Gorman | Emily Prud'hommeaux | Brian Roark | Richard Sproat
Proceedings of the Second Workshop on Computation and Written Language (CAWL) @ LREC-COLING 2024

2023

Proceedings of the Workshop on Computation and Written Language (CAWL 2023)
Kyle Gorman | Richard Sproat | Brian Roark
Proceedings of the Workshop on Computation and Written Language (CAWL 2023)

Myths about Writing Systems in Speech & Language Technology
Kyle Gorman | Richard Sproat
Proceedings of the Workshop on Computation and Written Language (CAWL 2023)

Natural language processing is largely focused on written text processing. However, many computational linguists tacitly endorse myths about the nature of writing. We highlight two of these myths—the conflation of language and writing, and the notion that Chinese, Japanese, and Korean writing is ideographic—and suggest how the community can dispel them.

2022

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation
Khuyagbaatar Batsuren | Gábor Bella | Aryaman Arora | Viktor Martinovic | Kyle Gorman | Zdeněk Žabokrtský | Amarsanaa Ganbold | Šárka Dohnalová | Magda Ševčíková | Kateřina Pelegrinová | Fausto Giunchiglia | Ryan Cotterell | Ekaterina Vylomova
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system submissions from 7 teams and the best system averaged 97.29% F1 score across all languages, ranging English (93.84%) to Latin (99.38%). Subtask 2, sentence-level morpheme segmentation, covered 18,735 sentences in 3 languages (Czech, English, Mongolian), received 10 system submissions from 3 teams, and the best systems outperformed all three state-of-the-art subword tokenization methods (BPE, ULM, Morfessor2) by 30.71% absolute. To facilitate error analysis and support any type of future studies, we released all system predictions, the evaluation script, and all gold standard datasets.

UniMorph 4.0: Universal Morphology
Khuyagbaatar Batsuren | Omer Goldman | Salam Khalifa | Nizar Habash | Witold Kieraś | Gábor Bella | Brian Leonard | Garrett Nicolai | Kyle Gorman | Yustinus Ghanggo Ate | Maria Ryskina | Sabrina Mielke | Elena Budianskaya | Charbel El-Khaissi | Tiago Pimentel | Michael Gasser | William Abbott Lane | Mohit Raj | Matt Coler | Jaime Rafael Montoya Samame | Delio Siticonatzi Camaiteri | Esaú Zumaeta Rojas | Didier López Francis | Arturo Oncevay | Juan López Bautista | Gema Celeste Silva Villegas | Lucas Torroba Hennigen | Adam Ek | David Guriel | Peter Dirix | Jean-Philippe Bernardy | Andrey Scherbakov | Aziyana Bayyr-ool | Antonios Anastasopoulos | Roberto Zariquiey | Karina Sheifer | Sofya Ganieva | Hilaria Cruz | Ritván Karahóǧa | Stella Markantonatou | George Pavlidis | Matvey Plugaryov | Elena Klyachko | Ali Salehi | Candy Angulo | Jatayu Baxi | Andrew Krizhanovsky | Natalia Krizhanovskaya | Elizabeth Salesky | Clara Vania | Sardana Ivanova | Jennifer White | Rowan Hall Maudslay | Josef Valvoda | Ran Zmigrod | Paula Czarnowska | Irene Nikkarinen | Aelita Salchak | Brijesh Bhatt | Christopher Straughn | Zoey Liu | Jonathan North Washington | Yuval Pinter | Duygu Ataman | Marcin Wolinski | Totok Suhardijanto | Anna Yablonskaya | Niklas Stoehr | Hossep Dolatian | Zahroh Nuriah | Shyam Ratan | Francis M. Tyers | Edoardo M. Ponti | Grant Aiton | Aryaman Arora | Richard J. Hatcher | Ritesh Kumar | Jeremiah Young | Daria Rodionova | Anastasia Yemelina | Taras Andrushko | Igor Marchenko | Polina Mashkovtseva | Alexandra Serova | Emily Prud’hommeaux | Maria Nepomniashchaya | Fausto Giunchiglia | Eleanor Chodroff | Mans Hulden | Miikka Silfverberg | Arya D. McCarthy | David Yarowsky | Ryan Cotterell | Reut Tsarfaty | Ekaterina Vylomova
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation, and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements on several fronts that were made in the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 66 new languages, including 24 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g., missing gender and macrons information. We have amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet.

2021

Grapheme-to-phoneme conversion is an important component in many speech technologies, but until recently there were no multilingual benchmarks for this task. The second iteration of the SIGMORPHON shared task on multilingual grapheme-to-phoneme conversion features many improvements from the previous year’s task (Gorman et al. 2020), including additional languages, a stronger baseline, three subtasks varying the amount of available resources, extensive quality assurance procedures, and automated error analyses. Four teams submitted a total of thirteen systems, at best achieving relative reductions of word error rate of 11% in the high-resource subtask and 4% in the low-resource subtask.

Structured abbreviation expansion in context
Kyle Gorman | Christo Kirov | Brian Roark | Richard Sproat
Findings of the Association for Computational Linguistics: EMNLP 2021

Ad hoc abbreviations are commonly found in informal communication channels that favor shorter messages. We consider the task of reversing these abbreviations in context to recover normalized, expanded versions of abbreviated messages. The problem is related to, but distinct from, spelling correction, as ad hoc abbreviations are intentional and can involve more substantial differences from the original words. Ad hoc abbreviations are also productively generated on-the-fly, so they cannot be resolved solely by dictionary lookup. We generate a large, open-source data set of ad hoc abbreviations. This data is used to study abbreviation strategies and to develop two strong baselines for abbreviation expansion.

Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Garrett Nicolai | Kyle Gorman | Ryan Cotterell
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

2020

Is the Best Better? Bayesian Statistical Model Comparison for Natural Language Processing
Piotr Szymański | Kyle Gorman
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Recent work raises concerns about the use of standard splits to compare natural language processing models. We propose a Bayesian statistical model comparison technique which uses k-fold cross-validation across multiple data sets to estimate the likelihood that one model will outperform the other, or that the two will produce practically equivalent results. We use this technique to rank six English part-of-speech taggers across two data sets and three evaluation metrics.

Massively Multilingual Pronunciation Modeling with WikiPron
Jackson L. Lee | Lucas F.E. Ashby | M. Elizabeth Garza | Yeonju Lee-Sikka | Sean Miller | Alan Wong | Arya D. McCarthy | Kyle Gorman
Proceedings of the Twelfth Language Resources and Evaluation Conference

We introduce WikiPron, an open-source command-line tool for extracting pronunciation data from Wiktionary, a collaborative multilingual online dictionary. We first describe the design and use of WikiPron. We then discuss the challenges faced scaling this tool to create an automatically-generated database of 1.7 million pronunciations from 165 languages. Finally, we validate the pronunciation database by using it to train and evaluating a collection of generic grapheme-to-phoneme models. The software, pronunciation data, and models are all made available under permissive open-source licenses.

The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion
Kyle Gorman | Lucas F.E. Ashby | Aaron Goyzueta | Arya McCarthy | Shijie Wu | Daniel You
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

We describe the design and findings of the SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion. Participants were asked to submit systems which take in a sequence of graphemes in a given language as input, then output a sequence of phonemes representing the pronunciation of that grapheme sequence. Nine teams submitted a total of 23 systems, at best achieving a 18% relative reduction in word error rate (macro-averaged over languages), versus strong neural sequence-to-sequence baselines. To facilitate error analysis, we publicly release the complete outputs for all systems—a first for the SIGMORPHON workshop.

Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Garrett Nicolai | Kyle Gorman | Ryan Cotterell
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological paradigms for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. We have implemented several improvements to the extraction pipeline which creates most of our data, so that it is both more complete and more correct. We have added 66 new languages, as well as new parts of speech for 12 languages. We have also amended the schema in several ways. Finally, we present three new community tools: two to validate data for resource creators, and one to make morphological data available from the command line. UniMorph is based at the Center for Language and Speech Processing (CLSP) at Johns Hopkins University in Baltimore, Maryland. This paper details advances made to the schema, tooling, and dissemination of project resources since the UniMorph 2.0 release described at LREC 2018.

Detecting Objectifying Language in Online Professor Reviews
Angie Waller | Kyle Gorman
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Student reviews often make reference to professors’ physical appearances. Until recently RateMyProfessors.com, the website of this study’s focus, used a design feature to encourage a “hot or not” rating of college professors. In the wake of recent #MeToo and #TimesUp movements, social awareness of the inappropriateness of these reviews has grown; however, objectifying comments remain and continue to be posted in this online context. We describe two supervised text classifiers for detecting objectifying commentary in professor reviews. We then ensemble these classifiers and use the resulting model to track objectifying commentary at scale. We measure correlations between objectifying commentary, changes to the review website interface, and teacher gender across a ten-year period.

2019

Neural Models of Text Normalization for Speech Applications
Hao Zhang | Richard Sproat | Axel H. Ng | Felix Stahlberg | Xiaochang Peng | Kyle Gorman | Brian Roark
Computational Linguistics, Volume 45, Issue 2 - June 2019

Machine learning, including neural network techniques, have been applied to virtually every domain in natural language processing. One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS). In this application, one must decide, for example, that 123 is verbalized as one hundred twenty three in 123 pages but as one twenty three in 123 King Ave. For this task, state-of-the-art industrial systems depend heavily on hand-written language-specific grammars. We propose neural network models that treat text normalization for TTS as a sequence-to-sequence problem, in which the input is a text token in context, and the output is the verbalization of that token. We find that the most effective model, in accuracy and efficiency, is one where the sentential context is computed once and the results of that computation are combined with the computation of each token in sequence to compute the verbalization. This model allows for a great deal of flexibility in terms of representing the context, and also allows us to integrate tagging and segmentation into the process. These models perform very well overall, but occasionally they will predict wildly inappropriate verbalizations, such as reading 3 cm as three kilometers. Although rare, such verbalizations are a major issue for TTS applications. We thus use finite-state covering grammars to guide the neural models, either during training and decoding, or just during decoding, away from such “unrecoverable” errors. Such grammars can largely be learned from data.

Weird Inflects but OK: Making Sense of Morphological Generation Errors
Kyle Gorman | Arya D. McCarthy | Ryan Cotterell | Ekaterina Vylomova | Miikka Silfverberg | Magdalena Markowska
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We conduct a manual error analysis of the CoNLL-SIGMORPHON Shared Task on Morphological Reinflection. This task involves natural language generation: systems are given a word in citation form (e.g., hug) and asked to produce the corresponding inflected form (e.g., the simple past hugged). We propose an error taxonomy and use it to annotate errors made by the top two systems across twelve languages. Many of the observed errors are related to inflectional patterns sensitive to inherent linguistic properties such as animacy or affect; many others are failures to predict truly unpredictable inflectional behaviors. We also find nearly one quarter of the residual “errors” reflect errors in the gold data.

We Need to Talk about Standard Splits
Kyle Gorman | Steven Bedrick
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

It is standard practice in speech & language technology to rank systems according to their performance on a test set held out for evaluation. However, few researchers apply statistical tests to determine whether differences in performance are likely to arise by chance, and few examine the stability of system ranking across multiple training-testing splits. We conduct replication and reproduction experiments with nine part-of-speech taggers published between 2000 and 2018, each of which claimed state-of-the-art performance on a widely-used “standard split”. While we replicate results on the standard split, we fail to reliably reproduce some rankings when we repeat this analysis with randomly generated training-testing splits. We argue that randomly generated splits should be used in system evaluation.

What Kind of Language Is Hard to Language-Model?
Sabrina J. Mielke | Ryan Cotterell | Kyle Gorman | Brian Roark | Jason Eisner
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

How language-agnostic are current state-of-the-art NLP tools? Are there some types of language that are easier to model with current methods? In prior work (Cotterell et al., 2018) we attempted to address this question for language modeling, and observed that recurrent neural network language models do not perform equally well over all the high-resource European languages found in the Europarl corpus. We speculated that inflectional morphology may be the primary culprit for the discrepancy. In this paper, we extend these earlier experiments to cover 69 languages from 13 language families using a multilingual Bible corpus. Methodologically, we introduce a new paired-sample multiplicative mixed-effects model to obtain language difficulty coefficients from at-least-pairwise parallel corpora. In other words, the model is aware of inter-sentence variation and can handle missing data. Exploiting this model, we show that “translationese” is not any easier to model than natively written language in a fair comparison. Trying to answer the question of what features difficult languages have in common, we try and fail to reproduce our earlier (Cotterell et al., 2018) observation about morphological complexity and instead reveal far simpler statistics of the data that seem to drive complexity in a much larger sample.

2018

Improving homograph disambiguation with supervised machine learning
Kyle Gorman | Gleb Mazovetskiy | Vitaly Nikolaev
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

Target word prediction and paraphasia classification in spoken discourse
Joel Adams | Steven Bedrick | Gerasimos Fergadiotis | Kyle Gorman | Jan van Santen
Proceedings of the 16th BioNLP Workshop

We present a system for automatically detecting and classifying phonologically anomalous productions in the speech of individuals with aphasia. Working from transcribed discourse samples, our system identifies neologisms, and uses a combination of string alignment and language models to produce a lattice of plausible words that the speaker may have intended to produce. We then score this lattice according to various features, and attempt to determine whether the anomalous production represented a phonemic error or a genuine neologism. This approach has the potential to be expanded to consider other types of paraphasic errors, and could be applied to a wide variety of screening and therapeutic applications.

2016

Pynini: A Python library for weighted finite-state grammar compilation
Kyle Gorman
Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata

Minimally Supervised Number Normalization
Kyle Gorman | Richard Sproat
Transactions of the Association for Computational Linguistics, Volume 4

We propose two models for verbalizing numbers, a key component in speech recognition and synthesis systems. The first model uses an end-to-end recurrent neural network. The second model, drawing inspiration from the linguistics literature, uses finite-state transducers constructed with a minimal amount of training data. While both models achieve near-perfect performance, the latter model can be trained using several orders of magnitude less data than the former, making it particularly useful for low-resource languages.

2015

Automated morphological analysis of clinical language samples
Kyle Gorman | Steven Bedrick | Géza Kiss | Eric Morley | Rosemary Ingham | Metrah Mohammed | Katina Papadakis | Jan van Santen
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

Co-authors

Ekaterina Vylomova 4

Lucas F.E. Ashby 3

Steven Bedrick 3

Sabrina J. Mielke 3

Miikka Silfverberg 3

Aryaman Arora 2

Khuyagbaatar Batsuren 2

Fausto Giunchiglia 2

Christo Kirov 2

Elena Klyachko 2

Andrew Krizhanovsky 2

Yeonju Lee-Sikka 2

Emily Prud’hommeaux 2

David Yarowsky 2

Jan van Santen 2

Cyril Allauzen 1

Antonios Anastasopoulos 1

Taras Andrushko 1

Timofey Arkhangelskiy 1

Yustinus Ghanggo Ate 1

Travis M. Bartley 1

Aziyana Bayyr-ool 1

Jean-Philippe Bernardy 1

Brijesh Bhatt 1

Elena Budianskaya 1

Delio Siticonatzi Camaiteri 1

Eleanor Chodroff 1

Simon Clematide 1

Paula Czarnowska 1

Luca Del Signore 1

Šárka Dohnalová 1

Hossep Dolatian 1

Charbel El-Khaissi 1

Valts Ernštreits 1

Gerasimos Fergadiotis 1

Amarsanaa Ganbold 1

Sofya Ganieva 1

M. Elizabeth Garza 1

Michael Gasser 1

Cameron Gibson 1

Aaron Goyzueta 1

Matteo Grella 1

Richard J. Hatcher 1

Rosemary Ingham 1

Sardana Ivanova 1

Cassandra L. Jacobs 1

Ritván Karahóǧa 1

Salam Khalifa 1

Witold Kieraś 1

Natalia Krizhanovskaya 1

Nataly Krizhanovsky 1

William Abbott Lane 1

Jackson L. Lee 1

Brian Leonard 1

Juan López Bautista 1

Didier López Francis 1

Peter Makarov 1

Aidan Malanoski 1

John Mansfield 1

Igor Marchenko 1

Stella Markantonatou 1

Magdalena Markowska 1

Viktor Martinovic 1

Polina Mashkovtseva 1

Rowan Hall Maudslay 1

Gleb Mazovetskiy 1

Metrah Mohammed 1

Maria Nepomniashchaya 1

Irene Nikkarinen 1

Vitaly Nikolaev 1

Zahroh Nuriah 1

Arturo Oncevay 1

Katina Papadakis 1

George Pavlidis 1

Kateřina Pelegrinová 1

Xiaochang Peng 1

Tiago Pimentel 1

Matvey Plugaryov 1

Edoardo M. Ponti 1

Daria Rodionova 1

Esaú Zumaeta Rojas 1

Maria Ryskina 1

Aelita Salchak 1

Elizabeth Salesky 1

Jaime Rafael Montoya Samame 1

Andrey Scherbakov 1

Arundhati Sengupta 1

Alexandra Serova 1

Karina Sheifer 1

Alexey Sorokin 1

Yulia Spektor 1

Felix Stahlberg 1

Niklas Stoehr 1

Christopher Straughn 1

Totok Suhardijanto 1

Piotr Szymański 1

Lucas Torroba Hennigen 1

Reut Tsarfaty 1

Francis Tyers 1

Josef Valvoda 1

Gema Celeste Silva Villegas 1

Jonathan Washington 1

Jennifer White 1

Adam Wiemerslage 1

Marcin Woliński 1

Anna Yablonskaya 1

Anastasia Yemelina 1

Jeremiah Young 1

Roberto Zariquiey 1

Katharina von der Wense 1

Magda Ševčíková 1

Zdeněk Žabokrtský 1

Venues