Exploration of VLMs for Driver Monitoring Systems Applications

Cañas, Paola Natalia, Nieto, Marcos, Otaegui, Oihana, Rodríguez, Igor

arXiv.org Artificial Intelligence 

VLMs have the potential to revolutionize driver and in-cabin monitoring by offering a more holistic understanding of the driving scene. Rather than focusing on individual variables, VLMs are trained to describe the entire scene, considering all crucial elements. This comprehensive approach allows them to construct a coherent narrative around the scene, leading to a more thorough assessment of the driver's situation. Despite the potential benefits, there is a notable lack of scientific research exploring the application of VLMs in this field. We aim to conduct an initial exploration of how these systems perform in tasks such as distraction detection, drowsiness detection, and gaze estimation. By evaluating their performance, we hope to determine whether they can match or even surpass state-of-the-art models, or identify areas where they fall short. To achieve this, we will utilize data from the Driver Monitoring Dataset (DMD), which contains extensive material of drivers in various states of drowsiness and distraction containing drivers doing several actions that imply distraction like texting, having a phone call, drinking water, besides driving safely, as well as detailed gaze annotations. By integrating VLMs into DMS, we expect the model to: Have better scene comprehension, enabling it to provide detailed descriptions and respond to queries through Visual Question Answering (VQA) tasks.