Toward Real Time Word Based Prosody Recognition

Alex Tilson, Frank Foerster


Abstract
Prosodic salience is a heuristic based on word-level prosody in child-directed speech that is thought to serve as a cue for attentional focus. It has been used in the context of robotic language acquisition to extract the contextually most relevant words from a human tutor’s speech to ground them in a robot’s sensorimotor data. However, the pipeline for performing word-based prosody-recognition operated in a semi-automatic manner and required substantial manual effort. We describe our efforts to automate the existing pipeline by including real time prosody recognition, and a modern speech recognition and forced alignment model. The intention is to enable its use in real time for human-in-the-loop robotic language acquisition and other socially driven forms of online learning.
Anthology ID:
2024.clasp-1.9
Volume:
Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning
Month:
October
Year:
2024
Address:
Gothenburg, Sweden
Editors:
Amy Qiu, Bill Noble, David Pagmar, Vladislav Maraev, Nikolai Ilinykh
Venue:
CLASP
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–67
Language:
URL:
https://aclanthology.org/2024.clasp-1.9
DOI:
Bibkey:
Cite (ACL):
Alex Tilson and Frank Foerster. 2024. Toward Real Time Word Based Prosody Recognition. In Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning, pages 62–67, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):
Toward Real Time Word Based Prosody Recognition (Tilson & Foerster, CLASP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clasp-1.9.pdf