An Analysis of Dialogue Act Sequence Similarity Across Multiple Domains

Ayesha Enayet, Gita Sukthankar


Abstract
This paper presents an analysis of how dialogue act sequences vary across different datasets in order to anticipate the potential degradation in the performance of learned models during domain adaptation. We hypothesize the following: 1) dialogue sequences from related domains will exhibit similar n-gram frequency distributions 2) this similarity can be expressed by measuring the average Hamming distance between subsequences drawn from different datasets. Our experiments confirm that when dialogue acts sequences from two datasets are dissimilar they lie further away in embedding space, making it possible to train a classifier to discriminate between them even when the datasets are corrupted with noise. We present results from eight different datasets: SwDA, AMI (DialSum), GitHub, Hate Speech, Teams, Diplomacy Betrayal, SAMsum, and Military (Army). Our datasets were collected from many types of human communication including strategic planning, informal discussion, and social media exchanges. Our methodology provides intuition on the generalizability of dialogue models trained on different datasets. Based on our analysis, it is problematic to assume that machine learning models trained on one type of discourse will generalize well to other settings, due to contextual differences.
Anthology ID:
2022.lrec-1.334
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3122–3130
Language:
URL:
https://aclanthology.org/2022.lrec-1.334
DOI:
Bibkey:
Cite (ACL):
Ayesha Enayet and Gita Sukthankar. 2022. An Analysis of Dialogue Act Sequence Similarity Across Multiple Domains. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3122–3130, Marseille, France. European Language Resources Association.
Cite (Informal):
An Analysis of Dialogue Act Sequence Similarity Across Multiple Domains (Enayet & Sukthankar, LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.334.pdf
Data
SAMSum