Text360Nav: 360-Degree Image Captioning Dataset for Urban Pedestrians Navigation

Chieko Nishimura; Shuhei Kurita; Yohei Seki

Text360Nav: 360-Degree Image Captioning Dataset for Urban Pedestrians Navigation

Chieko Nishimura, Shuhei Kurita, Yohei Seki

Abstract

Text feedback from urban scenes is a crucial tool for pedestrians to understand surroundings, obstacles, and safe pathways. However, existing image captioning datasets often concentrate on the overall image description and lack detailed scene descriptions, overlooking features for pedestrians walking on urban streets. We developed a new dataset to assist pedestrians in urban scenes using 360-degree camera images. Through our dataset of Text360Nav, we aim to provide textual feedback from machinery visual perception such as 360-degree cameras to visually impaired individuals and distracted pedestrians navigating urban streets, including those engrossed in their smartphones while walking. In experiments, we combined our dataset with multimodal generative models and observed that models trained with our dataset can generate textual descriptions focusing on street objects and obstacles that are meaningful in urban scenes in both quantitative and qualitative analyses, thus supporting the effectiveness of our dataset for urban pedestrian navigation.

Anthology ID:: 2024.lrec-main.1371
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 15783–15788
Language:
URL:: https://aclanthology.org/2024.lrec-main.1371/
DOI:
Bibkey:
Cite (ACL):: Chieko Nishimura, Shuhei Kurita, and Yohei Seki. 2024. Text360Nav: 360-Degree Image Captioning Dataset for Urban Pedestrians Navigation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15783–15788, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Text360Nav: 360-Degree Image Captioning Dataset for Urban Pedestrians Navigation (Nishimura et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.1371.pdf

PDF Cite Search Fix data