Chat Disentanglement: Data for New Domains and Methods for More Accurate Annotation

Sai R. Gouravajhala, Andrew M. Vernier, Yiming Shi, Zihan Li, Mark S. Ackerman, Jonathan K. Kummerfeld


Abstract
Conversation disentanglement is the task of taking a log of intertwined conversations from a shared channel and breaking the log into individual conversations. The standard datasets for disentanglement are in a single domain and were annotated by linguistics experts with careful training for the task. In this paper, we introduce the first multi-domain dataset and a study of annotation by people without linguistics expertise or extensive training. We experiment with several variations in interfaces, conducting user studies with domain experts and crowd workers. We also test a hypothesis from prior work that link-based annotation is more accurate, finding that it actually has comparable accuracy to set-based annotation. Our new dataset will support the development of more useful systems for this task, and our experimental findings suggest that users are capable of improving the usefulness of these systems by accurately annotating their own data.
Anthology ID:
2023.alta-1.12
Volume:
Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association
Month:
November
Year:
2023
Address:
Melbourne, Australia
Editors:
Smaranda Muresan, Vivian Chen, Kennington Casey, Vandyke David, Dethlefs Nina, Inoue Koji, Ekstedt Erik, Ultes Stefan
Venue:
ALTA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
112–117
Language:
URL:
https://aclanthology.org/2023.alta-1.12
DOI:
Bibkey:
Cite (ACL):
Sai R. Gouravajhala, Andrew M. Vernier, Yiming Shi, Zihan Li, Mark S. Ackerman, and Jonathan K. Kummerfeld. 2023. Chat Disentanglement: Data for New Domains and Methods for More Accurate Annotation. In Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association, pages 112–117, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Chat Disentanglement: Data for New Domains and Methods for More Accurate Annotation (Gouravajhala et al., ALTA 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.alta-1.12.pdf