EgoDrive: Egocentric Multimodal Driver Behavior Recognition Using Project Aria

Michael Rice, Lorenz Krause, Waqar Shahid Qureshi


Abstract
Egocentric sensing using wearable devices of- fers a unique first-person perspective for driver behavior analysis and monitoring, with the po- tential to accurately capture rich multimodal cues such as eye gaze, head motion, and hand activity directly from the driver’s view- point. In this paper, we introduce a multimodal driver behavior recognition framework utilizing Meta’s Project Aria smart glasses, along with a novel, synchronized egocentric driving dataset comprising high-resolution RGB video, gaze- tracking data, inertial IMU signals, hand pose landmarks, and YOLO-based semantic object detections. All sensor data streams are tempo- rally aligned and segmented into fixed-length clips, each manually annotated with one of six distinct driver behavior classes: Driving, Left Mirror Check, Right Wing Mirror Check, Rear- view Mirror Check, Mobile Phone Usage, and Idle. We design a Transformer-based recog- nition framework in which each modality is processed by a specialized encoder and then fused via Temporal Transformer layers to cap- ture cross-modal temporal dependencies. To in- vestigate the trade-off between accuracy and ef- ficiency for real-time deployment, we introduce two model variants: EgoDriveMax, optimized for maximum accuracy, and EgoDriveRT, de- signed for real-time performance. These mod- els achieve classification accuracies of 98.6% and 97.4% respectively. Notably, EgoDriveRT delivers strong performance despite operating with only 104K parameters and requiring just 2.65 ms per inference without the use of a spe- cialized GPU—highlighting its potential for efficient, real-time in-cabin driver monitoring.
Anthology ID:
2025.gaze4nlp-1.3
Volume:
Proceedings of the First International Workshop on Gaze Data and Natural Language Processing
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Cengiz Acarturk, Jamal Nasir, Burcu Can, Cagrı Coltekin
Venues:
Gaze4NLP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, BULGARIA
Note:
Pages:
18–25
Language:
URL:
https://aclanthology.org/2025.gaze4nlp-1.3/
DOI:
Bibkey:
Cite (ACL):
Michael Rice, Lorenz Krause, and Waqar Shahid Qureshi. 2025. EgoDrive: Egocentric Multimodal Driver Behavior Recognition Using Project Aria. In Proceedings of the First International Workshop on Gaze Data and Natural Language Processing, pages 18–25, Varna, Bulgaria. INCOMA Ltd., Shoumen, BULGARIA.
Cite (Informal):
EgoDrive: Egocentric Multimodal Driver Behavior Recognition Using Project Aria (Rice et al., Gaze4NLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.gaze4nlp-1.3.pdf