Speech Disfluencies occur at Higher Perplexities

Priyanka Sen


Abstract
Speech disfluencies have been hypothesized to occur before words that are less predictable and therefore more cognitively demanding. In this paper, we revisit this hypothesis by using OpenAI’s GPT-2 to calculate predictability of words as language model perplexity. Using the Switchboard corpus, we find that 51% of disfluencies occur at the highest, second highest, or within one token of the highest perplexity, and this distribution is not random. We also show that disfluencies precede words with significantly higher perplexity than fluent contexts. Based on our results, we offer new evidence that disfluencies are more likely to occur before less predictable words.
Anthology ID:
2020.cogalex-1.11
Volume:
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon
Month:
December
Year:
2020
Address:
Online
Editors:
Michael Zock, Emmanuele Chersoni, Alessandro Lenci, Enrico Santus
Venue:
CogALex
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
92–97
Language:
URL:
https://aclanthology.org/2020.cogalex-1.11
DOI:
Bibkey:
Cite (ACL):
Priyanka Sen. 2020. Speech Disfluencies occur at Higher Perplexities. In Proceedings of the Workshop on the Cognitive Aspects of the Lexicon, pages 92–97, Online. Association for Computational Linguistics.
Cite (Informal):
Speech Disfluencies occur at Higher Perplexities (Sen, CogALex 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.cogalex-1.11.pdf