Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features

Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, Elena Baralis


Abstract
Predictive models make mistakes and have biases. To combat both, we need to understand their predictions.Explainable AI (XAI) provides insights into models for vision, language, and tabular data. However, only a few approaches exist for speech classification models. Previous works focus on a selection of spoken language understanding (SLU) tasks, and most users find their explanations challenging to interpret.We propose a novel approach to explain speech classification models. It provides two types of insights. (i) Word-level. We measure the impact of each audio segment aligned with a word on the outcome. (ii) Paralinguistic. We evaluate how non-linguistic features (e.g., prosody and background noise) affect the outcome if perturbed.We validate our approach by explaining two state-of-the-art SLU models on two tasks in English and Italian. We test their plausibility with human subject ratings. Our results show that the explanations correctly represent the model’s inner workings and are plausible to humans.
Anthology ID:
2024.eacl-long.136
Volume:
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2221–2238
Language:
URL:
https://aclanthology.org/2024.eacl-long.136
DOI:
Bibkey:
Cite (ACL):
Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, and Elena Baralis. 2024. Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2221–2238, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features (Pastor et al., EACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eacl-long.136.pdf
Video:
 https://aclanthology.org/2024.eacl-long.136.mp4