Bayesian Learning
Deep learning based black spot identification on Greek road networks
Karamanlis, Ioannis, Kokkalis, Alexandros, Profillidis, Vassilios, Botzoris, George, Kiourt, Chairi, Sevetlidis, Vasileios, Pavlidis, George
Road safety is a crucial issue that affects not only the individuals involved in road accidents, but also society as a whole. The cost of road accidents in terms of human lives lost, physical and emotional suffering, and financial losses is enormous. Thus, it is important to understand the factors that contribute to road accidents and to develop strategies to reduce the number and severity of these incidents. One of the most important steps in this process is the identification of "black spots," areas where the number of accidents is significantly higher compared to other parts of the road network. The identification of black spots is crucial for prioritizing road safety interventions and evaluating their effectiveness in reducing the number of accidents. These events can range from minor incidents, such as fender benders, to serious crashes, resulting in fatalities or severe injuries. Thus, identifying these areas provides insights into the underlying causes of these accidents. For example, black spot analysis can reveal the presence of road design or infrastructure issues that may contribute to accidents, such as poor lighting, confusing road signs, and a lack of pedestrian crossings.
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking
In many domains, autoregressive models can attain high likelihood on the task of predicting the next observation. However, this maximum-likelihood (MLE) objective does not necessarily match a downstream use-case of autoregressively generating high-quality sequences. The MLE objective weights sequences proportionally to their frequency under the data distribution, with no guidance for the model's behaviour out of distribution (OOD): leading to compounding error during autoregressive generation. In order to address this compounding error problem, we formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset, including divergences with weight on OOD generated sequences. The IL framework also allows us to incorporate backtracking by introducing a backspace action into the generation process. This further mitigates the compounding error problem by allowing the model to revert a sampled token if it takes the sequence OOD. Our resulting method, SequenceMatch, can be implemented without adversarial training or major architectural changes. We identify the SequenceMatch-$\chi^2$ divergence as a more suitable training objective for autoregressive models which are used for generation. We show that empirically, SequenceMatch training leads to improvements over MLE on text generation with language models.
Replay-based Recovery for Autonomous Robotic Vehicles from Sensor Deception Attacks
Dash, Pritam, Li, Guanpeng, Karimibiuki, Mehdi, Pattabiraman, Karthik
Sensors are crucial for autonomous operation in robotic vehicles (RV). Unfortunately, RV sensors can be compromised by physical attacks such as tampering or spoofing, leading to a crash. In this paper, we present DeLorean, a modelfree recovery framework for recovering autonomous RVs from sensor deception attacks (SDA). DeLorean is designed to recover RVs even from a strong SDA in which the adversary targets multiple heterogeneous sensors simultaneously (even all the sensors). Under SDAs, DeLorean inspects the attack induced errors, identifies the targeted sensors, and prevents the erroneous sensor inputs from being used to derive actuator signals. DeLorean then replays historic state information in the RV's feedback control loop for a temporary mitigation and recovers the RV from SDA. Our evaluation on four real and two simulated RVs shows that DeLorean can recover RVs from SDAs, and ensure mission success in 90.7% of the cases on average.
Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification
Chatterji, Niladri S., Haque, Saminul, Hashimoto, Tatsunori
While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an $\textit{undersampled}$ balanced dataset often achieves close to state-of-the-art-accuracy across several popular benchmarks. This is rather surprising, since undersampling algorithms discard excess majority group data. To understand this phenomenon, we ask if learning is fundamentally constrained by a lack of minority group samples. We prove that this is indeed the case in the setting of nonparametric binary classification. Our results show that in the worst case, an algorithm cannot outperform undersampling unless there is a high degree of overlap between the train and test distributions (which is unlikely to be the case in real-world datasets), or if the algorithm leverages additional structure about the distribution shift. In particular, in the case of label shift we show that there is always an undersampling algorithm that is minimax optimal. In the case of group-covariate shift we show that there is an undersampling algorithm that is minimax optimal when the overlap between the group distributions is small. We also perform an experimental case study on a label shift dataset and find that in line with our theory, the test accuracy of robust neural network classifiers is constrained by the number of minority samples.
Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions
Cheng, Xiang, Wang, Bohan, Zhang, Jingzhao, Zhu, Yusong
MCMC algorithms offer empirically efficient tools for sampling from a target distribution $\pi(x) \propto \exp(-V(x))$. However, on the theory side, MCMC algorithms suffer from slow mixing rate when $\pi(x)$ is non-log-concave. Our work examines this gap and shows that when Poincar\'e-style inequality holds on a subset $\mathcal{X}$ of the state space, the conditional distribution of MCMC iterates over $\mathcal{X}$ mixes fast to the true conditional distribution. This fast mixing guarantee can hold in cases when global mixing is provably slow. We formalize the statement and quantify the conditional mixing rate. We further show that conditional mixing can have interesting implications for sampling from mixtures of Gaussians, parameter estimation for Gaussian mixture models and Gibbs-sampling with well-connected local minima.
Balanced Energy Regularization Loss for Out-of-distribution Detection
Choi, Hyunjun, Jeong, Hawook, Choi, Jin Young
In the field of out-of-distribution (OOD) detection, a previous method that use auxiliary data as OOD data has shown promising performance. However, the method provides an equal loss to all auxiliary data to differentiate them from inliers. However, based on our observation, in various tasks, there is a general imbalance in the distribution of the auxiliary OOD data across classes. We propose a balanced energy regularization loss that is simple but generally effective for a variety of tasks. Our balanced energy regularization loss utilizes class-wise different prior probabilities for auxiliary data to address the class imbalance in OOD data. The main concept is to regularize auxiliary samples from majority classes, more heavily than those from minority classes. Our approach performs better for OOD detection in semantic segmentation, long-tailed image classification, and image classification than the prior energy regularization loss. Furthermore, our approach achieves state-of-the-art performance in two tasks: OOD detection in semantic segmentation and long-tailed image classification. Code is available at https://github.com/hyunjunChhoi/Balanced_Energy.
Deep Huber quantile regression networks
Tyralis, Hristos, Papacharalampous, Georgia, Dogulu, Nilay, Chun, Kwok P.
Typical machine learning regression applications aim to report the mean or the median of the predictive probability distribution, via training with a squared or an absolute error scoring function. The importance of issuing predictions of more functionals of the predictive probability distribution (quantiles and expectiles) has been recognized as a means to quantify the uncertainty of the prediction. In deep learning (DL) applications, that is possible through quantile and expectile regression neural networks (QRNN and ERNN respectively). Here we introduce deep Huber quantile regression networks (DHQRN) that nest QRNNs and ERNNs as edge cases. DHQRN can predict Huber quantiles, which are more general functionals in the sense that they nest quantiles and expectiles as limiting cases. The main idea is to train a deep learning algorithm with the Huber quantile regression function, which is consistent for the Huber quantile functional. As a proof of concept, DHQRN are applied to predict house prices in Australia. In this context, predictive performances of three DL architectures are discussed along with evidential interpretation of results from an economic case study.
Prediction Sets Adaptive to Unknown Covariate Shift
Qiu, Hongxiang, Dobriban, Edgar, Tchetgen, Eric Tchetgen
Predicting sets of outcomes -- instead of unique outcomes -- is a promising solution to uncertainty quantification in statistical learning. Despite a rich literature on constructing prediction sets with statistical guarantees, adapting to unknown covariate shift -- a prevalent issue in practice -- poses a serious unsolved challenge. In this paper, we show that prediction sets with finite-sample coverage guarantee are uninformative and propose a novel flexible distribution-free method, PredSet-1Step, to efficiently construct prediction sets with an asymptotic coverage guarantee under unknown covariate shift. We formally show that our method is \textit{asymptotically probably approximately correct}, having well-calibrated coverage error with high confidence for large samples. We illustrate that it achieves nominal coverage in a number of experiments and a data set concerning HIV risk prediction in a South African cohort study. Our theory hinges on a new bound for the convergence rate of the coverage of Wald confidence intervals based on general asymptotically linear estimators.
Variational Sequential Optimal Experimental Design using Reinforcement Learning
Shen, Wanggang, Dong, Jiayuan, Huan, Xun
We introduce variational sequential Optimal Experimental Design (vsOED), a new method for optimally designing a finite sequence of experiments under a Bayesian framework and with information-gain utilities. Specifically, we adopt a lower bound estimator for the expected utility through variational approximation to the Bayesian posteriors. The optimal design policy is solved numerically by simultaneously maximizing the variational lower bound and performing policy gradient updates. We demonstrate this general methodology for a range of OED problems targeting parameter inference, model discrimination, and goal-oriented prediction. These cases encompass explicit and implicit likelihoods, nuisance parameters, and physics-based partial differential equation models. Our vsOED results indicate substantially improved sample efficiency and reduced number of forward model simulations compared to previous sequential design algorithms.
A Survey of Contextual Optimization Methods for Decision Making under Uncertainty
Sadana, Utsav, Chenreddy, Abhilash, Delage, Erick, Forel, Alexandre, Frejinger, Emma, Vidal, Thibaut
Recently there has been a surge of interest in operations research (OR) and the machine learning (ML) community in combining prediction algorithms and optimization techniques to solve decision-making problems in the face of uncertainty. This gave rise to the field of contextual optimization, under which data-driven procedures are developed to prescribe actions to the decision-maker that make the best use of the most recently updated information. A large variety of models and methods have been presented in both OR and ML literature under a variety of names, including data-driven optimization, prescriptive optimization, predictive stochastic programming, policy optimization, (smart) predict/estimate-then-optimize, decision-focused learning, (task-based) end-to-end learning/forecasting/optimization, etc. Focusing on single and two-stage stochastic programming problems, this review article identifies three main frameworks for learning policies from data and discusses their strengths and limitations. We present the existing models and methods under a uniform notation and terminology and classify them according to the three main frameworks identified. Our objective with this survey is to both strengthen the general understanding of this active field of research and stimulate further theoretical and algorithmic advancements in integrating ML and stochastic programming.