EgoDrive: Egocentric Multimodal Driver Behavior Recognition Using Project Aria

Michael Rice, Lorenz Krause, Waqar Shahid Qureshi


Abstract
Egocentric sensing using wearable devices offers a unique first-person perspective for driver behaviour analysis and monitoring, with the potential to accurately capture rich multimodal cues such as eye gaze, head motion, and hand activity directly from the driver’s viewpoint. In this paper, we introduce a multimodal driver behaviour recognition framework utilizing Meta’s Project Aria smart glasses, along with a novel, synchronized egocentric driving dataset comprising high-resolution Red Green Blue (RGB) video, gaze-tracking data, Inertial Measurement Unit (IMU) signals, hand pose landmarks, and YOLO-based semantic object detections. All sensor data streams are temporally aligned and segmented into fixed-length clips, each manually annotated with one of six distinct driver behavior classes: Driving, Left Mirror Check, Right Wing Mirror Check, Rear-view Mirror Check, Mobile Phone Usage, and Idle. We design a Transformer-based recognition framework in which each modality is processed by a specialized encoder and then fused via Temporal Transformer layers to capture cross-modal temporal dependencies. To investigate the trade-off between accuracy and efficiency for real-time deployment, we introduce two model variants: EgoDriveMax, optimized for maximum accuracy, and EgoDriveRT, designed for real-time performance. These models achieve classification accuracies of 98.6% and 97.4% respectively. Notably, EgoDriveRT delivers strong performance despite operating with only 104K parameters and requiring just 2.65 ms per inference without the use of a specialized GPU—highlighting its potential for efficient, real-time in-cabin driver monitoring.
Anthology ID:
2025.gaze4nlp-1.3
Volume:
Proceedings of the First International Workshop on Gaze Data and Natural Language Processing
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Cengiz Acarturk, Jamal Nasir, Burcu Can, Cagrı Coltekin
Venues:
Gaze4NLP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, BULGARIA
Note:
Pages:
18–25
Language:
URL:
https://aclanthology.org/2025.gaze4nlp-1.3/
DOI:
Bibkey:
Cite (ACL):
Michael Rice, Lorenz Krause, and Waqar Shahid Qureshi. 2025. EgoDrive: Egocentric Multimodal Driver Behavior Recognition Using Project Aria. In Proceedings of the First International Workshop on Gaze Data and Natural Language Processing, pages 18–25, Varna, Bulgaria. INCOMA Ltd., Shoumen, BULGARIA.
Cite (Informal):
EgoDrive: Egocentric Multimodal Driver Behavior Recognition Using Project Aria (Rice et al., Gaze4NLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.gaze4nlp-1.3.pdf