Quantifying the redundancy between prosody and text
Lukas Wolf | Tiago Pimentel | Evelina Fedorenko | Ryan Cotterell | Alex Warstadt | Ethan Wilcox | Tamar Regev
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Prosody—the suprasegmental component of speech, including pitch, loudness, and tempo—carries critical aspects of meaning. However, the relationship between the information conveyed by prosody vs. by the words themselves remains poorly understood. We use large language models (LLMs) to estimate how much information is redundant between prosody and the words themselves. Using a large spoken corpus of English audiobooks, we extract prosodic features aligned to individual words and test how well they can be predicted from LLM embeddings, compared to non-contextual word embeddings. We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features, including intensity, duration, pauses, and pitch contours. Furthermore, a word’s prosodic information is redundant with both the word itself and the context preceding as well as following it. Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words. Along with this paper, we release a general-purpose data processing pipeline for quantifying the relationship between linguistic information and extra-linguistic features.

A fine-grained comparison of pragmatic language understanding in humans and language models
Jennifer Hu | Sammy Floyd | Olessia Jouravlev | Evelina Fedorenko | Edward Gibson
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Pragmatics and non-literal language understanding are essential to human communication, and present a long-standing challenge for artificial language models. We perform a fine-grained comparison of language models and humans on seven pragmatic phenomena, using zero-shot prompting on an expert-curated set of English materials. We ask whether models (1) select pragmatic interpretations of speaker utterances, (2) make similar error patterns as humans, and (3) use similar linguistic cues as humans to solve the tasks. We find that the largest models achieve high accuracy and match human error patterns: within incorrect responses, models favor literal interpretations over heuristic-based distractors. We also find preliminary evidence that models and humans are sensitive to similar linguistic cues. Our results suggest that pragmatic behaviors can emerge in models without explicitly constructed representations of mental states. However, models tend to struggle with phenomena relying on social expectation violations.


SentSpace: Large-Scale Benchmarking and Evaluation of Text using Cognitively Motivated Lexical, Syntactic, and Semantic Features
Greta Tuckute | Aalok Sathe | Mingye Wang | Harley Yoder | Cory Shain | Evelina Fedorenko
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations

SentSpace is a modular framework for streamlined evaluation of text. SentSpacecharacterizes textual input using diverse lexical, syntactic, and semantic features derivedfrom corpora and psycholinguistic experiments. Core sentence features fall into three primaryfeature spaces: 1) Lexical, 2) Contextual, and 3) Embeddings. To aid in the analysis of computed features, SentSpace provides a web interface for interactive visualization and comparison with text from large corpora. The modular design of SentSpace allows researchersto easily integrate their own feature computation into the pipeline while benefiting from acommon framework for evaluation and visualization. In this manuscript we will describe thedesign of SentSpace, its core feature spaces, and demonstrate an example use case by comparing human-written and machine-generated (GPT2-XL) sentences to each other. We findthat while GPT2-XL-generated text appears fluent at the surface level, psycholinguistic normsand measures of syntactic processing reveal key differences between text produced by humansand machines. Thus, SentSpace provides a broad set of cognitively motivated linguisticfeatures for evaluation of text within natural language processing, cognitive science, as wellas the social sciences.


Syntactic dependencies correspond to word pairs with high mutual information
Richard Futrell | Peng Qian | Edward Gibson | Evelina Fedorenko | Idan Blank
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)


The Natural Stories Corpus
Richard Futrell | Edward Gibson | Harry J. Tily | Idan Blank | Anastasia Vishnevetsky | Steven Piantadosi | Evelina Fedorenko
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)