MTP: A Dataset for Multi-Modal Turning Points in Casual Conversations

Gia-Bao Ho, Chang Tan, Zahra Darban, Mahsa Salehi, Reza Haf, Wray Buntine


Abstract
Detecting critical moments, such as emotional outbursts or changes in decisions during conversations, is crucial for understanding shifts in human behavior and their consequences. Our work introduces a novel problem setting focusing on these moments as turning points (TPs), accompanied by a meticulously curated, high-consensus, human-annotated multi-modal dataset. We provide precise timestamps, descriptions, and visual-textual evidence high-lighting changes in emotions, behaviors, perspectives, and decisions at these turning points. We also propose a framework, TPMaven, utilizing state-of-the-art vision-language models to construct a narrative from the videos and large language models to classify and detect turning points in our multi-modal dataset. Evaluation results show that TPMaven achieves an F1-score of 0.88 in classification and 0.61 in detection, with additional explanations aligning with human expectations.
Anthology ID:
2024.acl-short.30
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
314–326
Language:
URL:
https://aclanthology.org/2024.acl-short.30
DOI:
Bibkey:
Cite (ACL):
Gia-Bao Ho, Chang Tan, Zahra Darban, Mahsa Salehi, Reza Haf, and Wray Buntine. 2024. MTP: A Dataset for Multi-Modal Turning Points in Casual Conversations. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 314–326, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
MTP: A Dataset for Multi-Modal Turning Points in Casual Conversations (Ho et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-short.30.pdf