Lorenz Krause
2025
EgoDrive: Egocentric Multimodal Driver Behavior Recognition Using Project Aria
Michael Rice | Lorenz Krause | Waqar Shahid Qureshi
Proceedings of the First International Workshop on Gaze Data and Natural Language Processing
Michael Rice | Lorenz Krause | Waqar Shahid Qureshi
Proceedings of the First International Workshop on Gaze Data and Natural Language Processing
Egocentric sensing using wearable devices offers a unique first-person perspective for driver behaviour analysis and monitoring, with the potential to accurately capture rich multimodal cues such as eye gaze, head motion, and hand activity directly from the driver’s viewpoint. In this paper, we introduce a multimodal driver behaviour recognition framework utilizing Meta’s Project Aria smart glasses, along with a novel, synchronized egocentric driving dataset comprising high-resolution Red Green Blue (RGB) video, gaze-tracking data, Inertial Measurement Unit (IMU) signals, hand pose landmarks, and YOLO-based semantic object detections. All sensor data streams are temporally aligned and segmented into fixed-length clips, each manually annotated with one of six distinct driver behavior classes: Driving, Left Mirror Check, Right Wing Mirror Check, Rear-view Mirror Check, Mobile Phone Usage, and Idle. We design a Transformer-based recognition framework in which each modality is processed by a specialized encoder and then fused via Temporal Transformer layers to capture cross-modal temporal dependencies. To investigate the trade-off between accuracy and efficiency for real-time deployment, we introduce two model variants: EgoDriveMax, optimized for maximum accuracy, and EgoDriveRT, designed for real-time performance. These models achieve classification accuracies of 98.6% and 97.4% respectively. Notably, EgoDriveRT delivers strong performance despite operating with only 104K parameters and requiring just 2.65 ms per inference without the use of a specialized GPU—highlighting its potential for efficient, real-time in-cabin driver monitoring.