Text2Traj2Text: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement Trajectories

Hikaru Asano, Ryo Yonetani, Taiki Sekii, Hiroki Ouchi


Abstract
This paper presents Text2Traj2Text, a novel learning-by-synthesis framework for captioning possible contexts behind shopper’s trajectory data in retail stores. Our work will impact various retail applications that need better customer understanding, such as targeted advertising and inventory management. The key idea is leveraging large language models to synthesize a diverse and realistic collection of contextual captions as well as the corresponding movement trajectories on a store map. Despite learned from fully synthesized data, the captioning model can generalize well to trajectories/captions created by real human subjects. Our systematic evaluation confirmed the effectiveness of the proposed framework over competitive approaches in terms of ROUGE and BERT Score metrics.
Anthology ID:
2024.inlg-main.24
Volume:
Proceedings of the 17th International Natural Language Generation Conference
Month:
September
Year:
2024
Address:
Tokyo, Japan
Editors:
Saad Mahamood, Nguyen Le Minh, Daphne Ippolito
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
289–302
Language:
URL:
https://aclanthology.org/2024.inlg-main.24
DOI:
Bibkey:
Cite (ACL):
Hikaru Asano, Ryo Yonetani, Taiki Sekii, and Hiroki Ouchi. 2024. Text2Traj2Text: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement Trajectories. In Proceedings of the 17th International Natural Language Generation Conference, pages 289–302, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
Text2Traj2Text: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement Trajectories (Asano et al., INLG 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.inlg-main.24.pdf