Modelling Filled Particles and Prolongation Using End-to-end Automatic Speech Recognition Systems: A Quantitative and Qualitative Analysis.

Vincenzo Norman Vitale, Loredana Schettino, Francesco Cutugno


Abstract
State-of-the-art automatic speech recognition systems based on End-to-End models (E2E-ASRs) achieve remarkable perfor mances. However, phenomena that characterize spoken language such as fillers (eeh ehm) or segmental prolongations (theee) are still mostly considered as disrupting objects that should not be included to obtain optimal transcriptions, despite their acknowledged regularity and communicative value. A recent study showed that two types of pre-trained systems with the same Conformer-based encoding architecture but different decoders – a Connectionist Temporal Classification (CTC) decoder and a Transducer decoder – tend to model some speech features that are functional for the identification of filled pauses and prolongation in speech. This work builds upon these findings by investigating which of the two systems is better at fillers and prolongations detection tasks and by conducting an error analysis to deepen our understanding of how these systems work.
Anthology ID:
2024.clicit-1.107
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
990–996
Language:
URL:
https://aclanthology.org/2024.clicit-1.107/
DOI:
Bibkey:
Cite (ACL):
Vincenzo Norman Vitale, Loredana Schettino, and Francesco Cutugno. 2024. Modelling Filled Particles and Prolongation Using End-to-end Automatic Speech Recognition Systems: A Quantitative and Qualitative Analysis.. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 990–996, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Modelling Filled Particles and Prolongation Using End-to-end Automatic Speech Recognition Systems: A Quantitative and Qualitative Analysis. (Vitale et al., CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.107.pdf