Directed Networks
Out-of-distribution Detection in Classifiers via Generation
Vernekar, Sachin, Gaurav, Ashish, Abdelzad, Vahdat, Denouden, Taylor, Salay, Rick, Czarnecki, Krzysztof
By design, discriminatively trained neural network classifiers produce reliable predictions only for in-distribution samples. For their real-world deployments, detecting out-of-distribution (OOD) samples is essential. Assuming OOD to be outside the closed boundary of in-distribution, typical neural classifiers do not contain the knowledge of this boundary for OOD detection during inference. There have been recent approaches to instill this knowledge in classifiers by explicitly training the classifier with OOD samples close to the in-distribution boundary. However, these generated samples fail to cover the entire in-distribution boundary effectively, thereby resulting in a sub-optimal OOD detector. In this paper, we analyze the feasibility of such approaches by investigating the complexity of producing such "effective" OOD samples. We also propose a novel algorithm to generate such samples using a manifold learning network (e.g., variational autoencoder) and then train an n+1 classifier for OOD detection, where the $n+1^{th}$ class represents the OOD samples. We compare our approach against several recent classifier-based OOD detectors on MNIST and Fashion-MNIST datasets. Overall the proposed approach consistently performs better than the others.
Kernels over Sets of Finite Sets using RKHS Embeddings, with Application to Bayesian (Combinatorial) Optimization
Buathong, Poompol, Ginsbourger, David, Krityakierne, Tipaluck
We focus on kernel methods for set-valued inputs and their application to Bayesian set optimization, notably combinatorial optimization. We introduce a class of (strictly) positive definite kernels that relies on Reproducing Kernel Hilbert Space embeddings, and successfully generalizes "double sum" set kernels recently considered in Bayesian set optimization, which turn out to be unsuitable for combinatorial optimization. The proposed class of kernels, for which we provide theoretical guarantees, essentially consists in applying an outer kernel on top of the canonical distance induced by a double sum kernel. Proofs of theoretical results about considered kernels are complemented by a few practicalities regarding hyperparameter fitting. We furthermore demonstrate the applicability of our approach in prediction and optimization tasks, relying both on toy examples and on two test cases from mechanical engineering and hydrogeology, respectively. Experimental results illustrate the added value of the approach and open new perspectives in prediction and sequential design with set inputs.
Optimal experimental design via Bayesian optimization: active causal structure learning for Gaussian process networks
von Kügelgen, Julius, Rubenstein, Paul K, Schölkopf, Bernhard, Weller, Adrian
We study the problem of causal discovery through targeted interventions. Starting from few observational measurements, we follow a Bayesian active learning approach to perform those experiments which, in expectation with respect to the current model, are maximally informative about the underlying causal structure. Unlike previous work, we consider the setting of continuous random variables with non-linear functional relationships, modelled with Gaussian process priors. To address the arising problem of choosing from an uncountable set of possible interventions, we propose to use Bayesian optimisation to efficiently maximise a Monte Carlo estimate of the expected information gain.
NGBoost: Natural Gradient Boosting for Probabilistic Prediction
Duan, Tony, Avati, Anand, Ding, Daisy Yi, Basu, Sanjay, Ng, Andrew Y., Schuler, Alejandro
We present Natural Gradient Boosting (NGBoost), an algorithm which brings probabilistic prediction capability to gradient boosting in a generic way. Predictive uncertainty estimation is crucial in many applications such as healthcare and weather forecasting. Probabilistic prediction, which is the approach where the model outputs a full probability distribution over the entire outcome space, is a natural way to quantify those uncertainties. Gradient Boosting Machines have been widely successful in prediction tasks on structured input data, but a simple boosting solution for probabilistic prediction of real valued outputs is yet to be made. NGBoost is a gradient boosting approach which uses the \emph{Natural Gradient} to address technical challenges that makes generic probabilistic prediction hard with existing gradient boosting methods. Our approach is modular with respect to the choice of base learner, probability distribution, and scoring rule. We show empirically on several regression datasets that NGBoost provides competitive predictive performance of both uncertainty estimates and traditional metrics.
How to Implement Bayesian Optimization from Scratch in Python
Many methods exist for function optimization, such as randomly sampling the variable search space, called random search, or systematically evaluating samples in a grid across the search space, called grid search. More principled methods are able to learn from sampling the space so that future samples are directed toward the parts of the search space that are most likely to contain the extrema. A directed approach to global optimization that uses probability is called Bayesian Optimization. Take my free 7-day email crash course now (with sample code). Click to sign-up and also get a free PDF Ebook version of the course.
DeepMind Researchers Develop Tools To Visualise ML Unfairness
Machine Learning engineers work around bias or the offsets in a model by drawing insights from the output, gauging the losses, going through tonnes of data and repeating till agreeable results have been obtained. This is a traditional process which takes time but works decently. An alternative to this approach is the Lagrangian approach, a mathematical method to find the local maxima and local minima of a function when provided with equality constraints. This too, comes with its own set of complexities. The unfairness of machine learning algorithms was exposed when they were deployed for manual tasks like hiring, surveillance and other such critical tasks, where the damages can be irreversible.
bayespy/bayespy
BayesPy provides tools for Bayesian inference with Python. The user constructs a model as a Bayesian network, observes data and runs posterior inference. The goal is to provide a tool which is efficient, flexible and extendable enough for expert use but also accessible for more casual users. Currently, only variational Bayesian inference for conjugate-exponential family (variational message passing) has been implemented. Future work includes variational approximations for other types of distributions and possibly other approximate inference methods such as expectation propagation, Laplace approximations, Markov chain Monte Carlo (MCMC) and other methods. BayesPy including the documentation is licensed under the MIT License.
Stochastic Triangular Mesh Mapping
Lombard, Clint D., van Daalen, Corné E.
For mobile robots to operate autonomously in general environments, perception is required in the form of a dense metric map. For this purpose, we present the stochastic triangular mesh (STM) mapping technique: a 2.5-D representation of the surface of the environment using a continuous mesh of triangular surface elements, where each surface element models the mean plane and roughness of the underlying surface. In contrast to existing mapping techniques, a STM map models the structure of the environment by ensuring a continuous model, while also being able to be incrementally updated with linear computational cost in the number of measurements. We reduce the effect of uncertainty in the robot pose (position and orientation) by using landmark-relative submaps. The uncertainty in the measurements and robot pose are accounted for by the use of Bayesian inference techniques during the map update. We demonstrate that a STM map can be used with sensors that generate point measurements, such as light detection and ranging (LiDAR) sensors and stereo cameras. We show that a STM map is a more accurate model than the only comparable online surface mapping technique$\unicode{x2014}$a standard elevation map$\unicode{x2014}$and we also provide qualitative results on practical datasets.
Receding Horizon Curiosity
Schultheis, Matthias, Belousov, Boris, Abdulsamad, Hany, Peters, Jan
Sample-efficient exploration is crucial not only for discovering rewarding experiences but also for adapting to environment changes in a task-agnostic fashion. A principled treatment of the problem of optimal input synthesis for system identification is provided within the framework of sequential Bayesian experimental design. In this paper, we present an effective trajectory-optimization-based approximate solution of this otherwise intractable problem that models optimal exploration in an unknown Markov decision process (MDP). By interleaving episodic exploration with Bayesian nonlinear system identification, our algorithm takes advantage of the inductive bias to explore in a directed manner, without assuming prior knowledge of the MDP. Empirical evaluations indicate a clear advantage of the proposed algorithm in terms of the rate of convergence and the final model fidelity when compared to intrinsic-motivation-based algorithms employing exploration bonuses such as prediction error and information gain. Moreover, our method maintains a computational advantage over a recent model-based active exploration (MAX) algorithm, by focusing on the information gain along trajectories instead of seeking a global exploration policy. A reference implementation of our algorithm and the conducted experiments is publicly available.
How to Develop a Naive Bayes Classifier from Scratch in Python
Classification is a predictive modeling problem that involves assigning a label to a given input data sample. The problem of classification predictive modeling can be framed as calculating the conditional probability of a class label given a data sample. Bayes Theorem provides a principled way for calculating this conditional probability, although in practice requires an enormous number of samples (very large-sized dataset) and is computationally expensive. Instead, the calculation of Bayes Theorem can be simplified by making some assumptions, such as each input variable is independent of all other input variables. Although a dramatic and unrealistic assumption, this has the effect of making the calculations of the conditional probability tractable and results in an effective classification model referred to as Naive Bayes.