Comparing Pre-Trained Embeddings and Domain-Independent Features for Regression-Based Evaluation of Task-Oriented Dialogue Systems

Kallirroi Georgila

doi:10.18653/v1/2024.sigdial-1.52

Comparing Pre-Trained Embeddings and Domain-Independent Features for Regression-Based Evaluation of Task-Oriented Dialogue Systems

Abstract

We use Gaussian Process Regression to predict different types of ratings provided by users after interacting with various task-oriented dialogue systems. We compare the performance of domain-independent dialogue features (e.g., duration, number of filled slots, number of confirmed slots, word error rate) with pre-trained dialogue embeddings. These pre-trained dialogue embeddings are computed by averaging over sentence embeddings in a dialogue. Sentence embeddings are created using various models based on sentence transformers (appearing on the Hugging Face Massive Text Embedding Benchmark leaderboard) or by averaging over BERT word embeddings (varying the BERT layers used). We also compare pre-trained embeddings extracted from human transcriptions with pre-trained embeddings extracted from speech recognition outputs, to determine the robustness of these models to errors. Our results show that overall, for most types of user satisfaction ratings and advanced/recent (or sometimes less advanced/recent) pre-trained embedding models, using only pre-trained embeddings outperforms using only domain-independent features. However, this pattern varies depending on the type of rating and the embedding model used. Also, pre-trained embeddings are found to be robust to speech recognition errors, more advanced/recent embedding models do not always perform better than less advanced/recent ones, and larger models do not necessarily outperform smaller ones. The best prediction performance is achieved by combining pre-trained embeddings with domain-independent features.

Anthology ID:: 2024.sigdial-1.52
Volume:: Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:: September
Year:: 2024
Address:: Kyoto, Japan
Editors:: Tatsuya Kawahara, Vera Demberg, Stefan Ultes, Koji Inoue, Shikib Mehri, David Howcroft, Kazunori Komatani
Venue:: SIGDIAL
SIG:: SIGDIAL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 610–623
Language:
URL:: https://aclanthology.org/2024.sigdial-1.52
DOI:: 10.18653/v1/2024.sigdial-1.52
Bibkey:
Cite (ACL):: Kallirroi Georgila. 2024. Comparing Pre-Trained Embeddings and Domain-Independent Features for Regression-Based Evaluation of Task-Oriented Dialogue Systems. In Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 610–623, Kyoto, Japan. Association for Computational Linguistics.
Cite (Informal):: Comparing Pre-Trained Embeddings and Domain-Independent Features for Regression-Based Evaluation of Task-Oriented Dialogue Systems (Georgila, SIGDIAL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.sigdial-1.52.pdf

PDF Cite Search