Automating the Detection of Poetic Features: The Limerick as Model Organism

Almas Abdibayev, Yohei Igarashi, Allen Riddell, Daniel Rockmore


Abstract
In this paper we take up the problem of “limerick detection” and describe a system to identify five-line poems as limericks or not. This turns out to be a surprisingly difficult challenge with many subtleties. More precisely, we produce an algorithm which focuses on the structural aspects of the limerick – rhyme scheme and rhythm (i.e., stress patterns) – and when tested on a a culled data set of 98,454 publicly available limericks, our “limerick filter” accepts 67% as limericks. The primary failure of our filter is on the detection of “non-standard” rhymes, which we highlight as an outstanding challenge in computational poetics. Our accent detection algorithm proves to be very robust. Our main contributions are (1) a novel rhyme detection algorithm that works on English words including rare proper nouns and made-up words (and thus, words not in the widely used CMUDict database); (2) a novel rhythm-identifying heuristic that is robust to language noise at moderate levels and comparable in accuracy to state-of-the-art scansion algorithms. As a third significant contribution (3) we make publicly available a large corpus of limericks that includes tags of “limerick” or “not-limerick” as determined by our identification software, thereby providing a benchmark for the community. The poetic tasks that we have identified as challenges for machines suggest that the limerick is a useful “model organism” for the study of machine capabilities in poetry and more broadly literature and language. We include a list of open challenges as well. Generally, we anticipate that this work will provide useful material and benchmarks for future explorations in the field.
Anthology ID:
2021.latechclfl-1.9
Volume:
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic (online)
Venues:
CLFL | EMNLP | LaTeCH | LaTeCHCLfL
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
80–90
Language:
URL:
https://aclanthology.org/2021.latechclfl-1.9
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.latechclfl-1.9.pdf