Andrew M. Vernier
2023
Chat Disentanglement: Data for New Domains and Methods for More Accurate Annotation
Sai R. Gouravajhala
|
Andrew M. Vernier
|
Yiming Shi
|
Zihan Li
|
Mark S. Ackerman
|
Jonathan K. Kummerfeld
Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association
Conversation disentanglement is the task of taking a log of intertwined conversations from a shared channel and breaking the log into individual conversations. The standard datasets for disentanglement are in a single domain and were annotated by linguistics experts with careful training for the task. In this paper, we introduce the first multi-domain dataset and a study of annotation by people without linguistics expertise or extensive training. We experiment with several variations in interfaces, conducting user studies with domain experts and crowd workers. We also test a hypothesis from prior work that link-based annotation is more accurate, finding that it actually has comparable accuracy to set-based annotation. Our new dataset will support the development of more useful systems for this task, and our experimental findings suggest that users are capable of improving the usefulness of these systems by accurately annotating their own data.