Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates

Rafael Mestre, Stuart E. Middleton, Matt Ryan, Masood Gheasi, Timothy Norman, Jiatong Zhu


Abstract
The integration of multimodality in natural language processing (NLP) tasks seeks to exploit the complementary information contained in two or more modalities, such as text, audio and video. This paper investigates the integration of often under-researched audio features with text, using the task of argumentation mining (AM) as a case study. We take a previously reported dataset and present an audio-enhanced version (the Multimodal USElecDeb60To16 dataset). We report the performance of two text models based on BERT and GloVe embeddings, one audio model (based on CNN and Bi-LSTM) and multimodal combinations, on a dataset of 28,850 utterances. The results show that multimodal models do not outperform text-based models when using the full dataset. However, we show that audio features add value in fully supervised scenarios with limited data. We find that when data is scarce (e.g. with 10% of the original dataset) multimodal models yield improved performance, whereas text models based on BERT considerably decrease performance. Finally, we conduct a study with artificially generated voices and an ablation study to investigate the importance of different audio features in the audio models.
Anthology ID:
2023.findings-eacl.21
Volume:
Findings of the Association for Computational Linguistics: EACL 2023
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
274–288
Language:
URL:
https://aclanthology.org/2023.findings-eacl.21
DOI:
10.18653/v1/2023.findings-eacl.21
Bibkey:
Cite (ACL):
Rafael Mestre, Stuart E. Middleton, Matt Ryan, Masood Gheasi, Timothy Norman, and Jiatong Zhu. 2023. Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates. In Findings of the Association for Computational Linguistics: EACL 2023, pages 274–288, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates (Mestre et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-eacl.21.pdf
Video:
 https://aclanthology.org/2023.findings-eacl.21.mp4