Presentation + Paper
4 April 2022 Video-based driver emotion recognition using hybrid deep spatio-temporal feature learning
Author Affiliations +
Abstract
Road traffic crashes have become the leading cause of death for young people. Approximately 1.3 million people die due to road traffic crashes, and more than 30 million people suffer non-fatal injuries. Various studies have shown that emotions influence driving performance. In this work, we focus on frame-level video-based categorical emotion recognition in drivers. We propose a Convolutional Bidirectional Long Short-Term Memory Neural Network (CBiLSTM) architecture to capture the spatio-temporal features of the video data effectively. For this, the facial videos of drivers are obtained from two publicly available datasets, namely Keimyung University Facial Expression of Drivers (KMU-FED), a subset of the Driver Monitoring Dataset (DMD), and an experimental dataset. Firstly, we extract the face region from the video frames using the Facial Alignment Network (FAN). Secondly, these face regions are encoded using a lightweight SqueezeNet CNN model. The output of the CNN model is fed into a two-layered BiLSTM network for spatio-temporal feature learning. Finally, a fully-connected layer outputs the emotion class softmax probabilities. Furthermore, we enable interpretable visualizations of the results using Axiom-based Grad-CAM (XGrad-CAM). For this study, we manually annotated the DMD and our experimental dataset using an interactive annotation tool. Our framework achieves an F1-score of 0.958 on the KMU-FED dataset. We evaluate our model using Leave-One-Out Cross-Validation (LOOCV) for the DMD and the experimental dataset and achieve average F1-scores of 0.745 and 0.414 respectively.
Conference Presentation
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Harshit Varma, Nagarajan Ganapathy, and Thomas M. Deserno "Video-based driver emotion recognition using hybrid deep spatio-temporal feature learning", Proc. SPIE 12037, Medical Imaging 2022: Imaging Informatics for Healthcare, Research, and Applications, 1203709 (4 April 2022); https://doi.org/10.1117/12.2613118
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Data modeling

Cameras

Facial recognition systems

Roads

Visual process modeling

Visualization

Back to Top