Alex Tilson
2024
Toward Real Time Word Based Prosody Recognition
Alex Tilson
|
Frank Foerster
Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning
Prosodic salience is a heuristic based on word-level prosody in child-directed speech that is thought to serve as a cue for attentional focus. It has been used in the context of robotic language acquisition to extract the contextually most relevant words from a human tutor’s speech to ground them in a robot’s sensorimotor data. However, the pipeline for performing word-based prosody-recognition operated in a semi-automatic manner and required substantial manual effort. We describe our efforts to automate the existing pipeline by including real time prosody recognition, and a modern speech recognition and forced alignment model. The intention is to enable its use in real time for human-in-the-loop robotic language acquisition and other socially driven forms of online learning.
Search