Tickoo, Omesh
Uncertainty aware multimodal activity recognition with Bayesian inference
Subedar, Mahesh, Krishnan, Ranganath, Meyer, Paulo Lopez, Tickoo, Omesh, Huang, Jonathan
Deep neural networks (DNNs) provide state-of-the-art results for a multitude of applications, but the use of DNNs for multimodal audiovisual applications is still an unsolved problem. The current approaches that combine audiovisual information do not consider inherent uncertainty or leverage true classification confidence associated with each modality in the final decision. Our contribution in this work is to apply Bayesian variational inference to DNNs for audiovisual activity recognition and quantify model uncertainty along with principled confidence. We propose a novel approach that combines deterministic and variational layers to estimate model uncertainty and principled confidence. Our experiments with in- and out-of-distribution samples selected from a subset of the Moments-in-Time (MiT) dataset show more reliable confidence measure as compared to the non-Bayesian baseline. We also demonstrate the uncertainty estimates obtained from this framework can identify out-of-distribution data on the UCF101 and MiT datasets. In the multimodal setting, the proposed framework improved precision-recall AUC by 14.4% on the subset of MiT dataset as compared to non-Bayesian baseline.
BAR: Bayesian Activity Recognition using variational inference
Krishnan, Ranganath, Subedar, Mahesh, Tickoo, Omesh
Uncertainty estimation in deep neural networks is essential for designing reliable and robust AI systems. Applications such as video surveillance for identifying suspicious activities are designed with deep neural networks (DNNs), but DNNs do not provide uncertainty estimates. Capturing reliable uncertainty estimates in safety and security critical applications will help to establish trust in the AI system. Our contribution is to apply Bayesian deep learning framework to visual activity recognition application and quantify model uncertainty along with principled confidence. We utilize the variational inference technique while training the Bayesian DNNs to infer the approximate posterior distribution around model parameters and perform Monte Carlo sampling on the posterior of model parameters to obtain the predictive distribution. We show that the Bayesian inference applied to DNNs provides reliable confidence measures for visual activity recognition task as compared to the conventional DNNs. We also show that our method improves the visual activity recognition precision-recall score by 6% compared to non-Bayesian baseline. We evaluate our models on Moments-In-Time (MiT) activity recognition dataset by selecting a subset of in- and out-of-distribution video samples.