Goto

Collaborating Authors

 Bayesian Learning


BSL: An R Package for Efficient Parameter Estimation for Simulation-Based Models via Bayesian Synthetic Likelihood

arXiv.org Machine Learning

Bayesian synthetic likelihood (BSL) is a popular method for estimating the parameter posterior distribution for complex statistical models and stochastic processes that possess a computationally intractable likelihood function. Instead of evaluating the likelihood, BSL approximates the likelihood of a judiciously chosen summary statistic of the data via model simulation and density estimation. Compared to alternative methods such as approximate Bayesian computation (ABC), BSL requires little tuning and requires less model simulations than ABC when the chosen summary statistic is high-dimensional. The original synthetic likelihood relies on a multivariate normal approximation of the intractable likelihood, where the mean and covariance are estimated by simulation. An extension of BSL considers replacing the sample covariance with a penalised covariance estimator to reduce the number of required model simulations. Further, a semi-parametric approach has been developed to relax the normality assumption. In this paper, we present an R package called BSL that amalgamates the aforementioned methods and more into a single, easy-to-use and coherent piece of software. The R package also includes several examples to illustrate how to use the package and demonstrate the utility of the methods.


Towards Scalable Gaussian Process Modeling

arXiv.org Machine Learning

Numerous engineering problems of interest to the industry are often characterized by expensive black-box objective experiments or computer simulations. Obtaining insight into the problem or performing subsequent optimizations requires hundreds of thousands of evaluations of the objective function which is most often a practically unachievable task. Gaussian Process (GP) surrogate modeling replaces the expensive function with a cheap-to-evaluate data-driven probabilistic model. While the GP does not assume a functional form of the problem, it is defined by a set of parameters, called hyperparameters. The hyperparameters define the characteristics of the objective function, such as smoothness, magnitude, periodicity, etc. Accurately estimating these hyperparameters is a key ingredient in developing a reliable and generalizable surrogate model. Markov chain Monte Carlo (MCMC) is a ubiquitously used Bayesian method to estimate these hyperparameters. At the GE Global Research Center, a customized industry-strength Bayesian hybrid modeling framework utilizing the GP, called GEBHM, has been employed and validated over many years. GEBHM is very effective on problems of small and medium size, typically less than 1000 training points. However, the GP does not scale well in time with a growing dataset and problem dimensionality which can be a major impediment in such problems. In this work, we extend and implement in GEBHM an Adaptive Sequential Monte Carlo (ASMC) methodology for training the GP enabling the modeling of large-scale industry problems. This implementation saves computational time (especially for large-scale problems) while not sacrificing predictability over the current MCMC implementation. We demonstrate the effectiveness and accuracy of GEBHM with ASMC on four mathematical problems and on two challenging industry applications of varying complexity.


Prediction of Highway Lane Changes Based on Prototype Trajectories

arXiv.org Machine Learning

The vision of automated driving is to increase both road safety and efficiency, while offering passengers a convenient travel experience. This requires that autonomous systems correctly estimate the current traffic scene and its likely evolution. In highway scenarios early recognition of cutin maneuvers is essential for risk-aware maneuver planning. In this paper, a statistical approach is proposed, which advantageously utilizes a set of prototypical lane change trajectories to realize both early maneuver detection and uncertainty-aware trajectory prediction for traffic participants. Generation of prototype trajectories from real traffic data is accomplished by Agglomerative Hierarchical Clustering. During clustering, the alignment of the cluster prototypes to each other is optimized and the cohesion of the resulting prototype is limited when two clusters merge. In the prediction stage, the similarity of observed vehicle motion and typical lane change patterns in the data base is evaluated to construct a set of significant features for maneuver classification via Boosted Decision T rees. The future trajectory is predicted combining typical lane change realizations in a mixture model. B-splines based trajectory adaptations guarantee continuity during transition from actually observed to predicted vehicle states. Quantitative evaluation results demonstrate the proposed concept's improved performance for both maneuver and trajectory prediction compared to a previously implemented reference approach. The development of automated driving functions is a central activity of industry and science.


Filter Bank Regularization of Convolutional Neural Networks

arXiv.org Machine Learning

Regularization techniques are widely used to improve the generality, robustness, and efficiency of deep convolu-tional neural networks (DCNNs). In this paper, we propose a novel approach of regulating DCNN convolutional kernels by a structured filter bank. Comparing with the existing regularization methods, such as null 1 or null 2 minimization of DCNN kernel weights and the kernel orthogonality, which ignore sample correlations within a kernel, the use of filter bank in regularization of DCNNs can mold the DCNN kernels to common spatial structures and features (e.g., edges or textures of various orientations and frequencies) of natural images. On the other hand, unlike directly making DCNN kernels fixed filters, the filter bank regularization still allows the freedom of optimizing DCNN weights via deep learning. This new DCNN design strategy aims to combine the best of two worlds: the inclusion of structural image priors of traditional filter banks to improve the robustness and generality of DCNN solutions and the capability of modern deep learning to model complex nonlinear functions hidden in training data. Experimental results on object recognition tasks show that the proposed regularization approach guides DCNNs to faster convergence and better generalization than existing regularization methods of weight decay and kernel orthogonality. 1. Introduction 1.1. Regularization Deep convolutional neural networks (DCNNs) have rapidly matured as an effective tool for almost all computer vision tasks [6, 7, 8, 22, 24, 27], including object recognition, classification, segmentation, superresolution, etc.


Invariance reduces Variance: Understanding Data Augmentation in Deep Learning and Beyond

arXiv.org Machine Learning

Many complex deep learning models have found success by exploiting symmetries in data. Convolutional neural networks (CNNs), for example, are ubiquitous in image classification due to their use of translation symmetry, as image identity is roughly invariant to translations. In addition, many other forms of symmetry such as rotation, scale, and color shift are commonly used via data augmentation: the transformed images are added to the training set. However, a clear framework for understanding data augmentation is not available. One may even say that it is somewhat mysterious: how can we increase performance by simply adding transforms of our data to the model? Can that be information theoretically possible? In this paper, we develop a theoretical framework to start to shed light on some of these problems. We explain data augmentation as averaging over the orbits of the group that keeps the data distribution invariant, and show that it leads to variance reduction. We study finite-sample and asymptotic empirical risk minimization (using results from stochastic convex optimization, Rademacher complexity, and asymptotic statistical theory). We work out as examples the variance reduction in exponential families, linear regression, and certain two-layer neural networks under shift invariance (using discrete Fourier analysis). We also discuss how data augmentation could be used in problems with symmetry where other approaches are prevalent, such as in cryo-electron microscopy (cryo-EM).


The Virtual Patch Clamp: Imputing C. elegans Membrane Potentials from Calcium Imaging

arXiv.org Artificial Intelligence

We develop a stochastic whole-brain and body simulator of the nematode roundworm Caenorhabditis elegans (C. elegans) and show that it is sufficiently regularizing to allow imputation of latent membrane potentials from partial calcium fluorescence imaging observations. This is the first attempt we know of to "complete the circle," where an anatomically grounded whole-connectome simulator is used to impute a time-varying "brain" state at single-cell fidelity from covariates that are measurable in practice. The sequential Monte Carlo (SMC) method we employ not only enables imputation of said latent states but also presents a strategy for learning simulator parameters via variational optimization of the noisy model evidence approximation provided by SMC. Our imputation and parameter estimation experiments were conducted on distributed systems using novel implementations of the aforementioned techniques applied to synthetic data of dimension and type representative of that which are measured in laboratories currently.


Music Recommendations in Hyperbolic Space: An Application of Empirical Bayes and Hierarchical Poincar\'e Embeddings

arXiv.org Machine Learning

Matrix Factorization (MF) is a common method for generating recommendations, where the proximity of entities like users or items in the embedded space indicates their similarity to one another. Though almost all applications implicitly use a Euclidean embedding space to represent two entity types, recent work has suggested that a hyperbolic Poincar\'e ball may be more well suited to representing multiple entity types, and in particular, hierarchies. We describe a novel method to embed a hierarchy of related music entities in hyperbolic space. We also describe how a parametric empirical Bayes approach can be used to estimate link reliability between entities in the hierarchy. Applying these methods together to build personalized playlists for users in a digital music service yielded a large and statistically significant increase in performance during an A/B test, as compared to the Euclidean model.


Classified Regression for Bayesian Optimization: Robot Learning with Unknown Penalties

arXiv.org Machine Learning

Learning robot controllers by minimizing a black-box objective cost using Bayesian optimization (BO) can be time-consuming and challenging. It is very often the case that some roll-outs result in failure behaviors, causing premature experiment detention. In such cases, the designer is forced to decide on heuristic cost penalties because the acquired data is often scarce, or not comparable with that of the stable policies. To overcome this, we propose a Bayesian model that captures exactly what we know about the cost of unstable controllers prior to data collection: Nothing, except that it should be a somewhat large number. The resulting Bayesian model, approximated with a Gaussian process, predicts high cost values in regions where failures are likely to occur. In this way, the model guides the BO exploration toward regions of stability. We demonstrate the benefits of the proposed model in several illustrative and statistical synthetic benchmarks, and also in experiments on a real robotic platform. In addition, we propose and experimentally validate a new BO method to account for unknown constraints. Such method is an extension of Max-Value Entropy Search, a recent information-theoretic method, to solve unconstrained global optimization problems.


A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce

arXiv.org Machine Learning

Personalized size and fit recommendations bear crucial significance for any fashion e-commerce platform. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns. Traditional collaborative filtering algorithms seek to model customer preferences based on their previous orders. A typical challenge for such methods stems from extreme sparsity of customer-article orders. To alleviate this problem, we propose a deep learning based content-collaborative methodology for personalized size and fit recommendation. Our proposed method can ingest arbitrary customer and article data and can model multiple individuals or intents behind a single account. The method optimizes a global set of parameters to learn population-level abstractions of size and fit relevant information from observed customer-article interactions. It further employs customer and article specific embedding variables to learn their properties. Together with learned entity embeddings, the method maps additional customer and article attributes into a latent space to derive personalized recommendations. Application of our method to two publicly available datasets demonstrate an improvement over the state-of-the-art published results. On two proprietary datasets, one containing fit feedback from fashion experts and the other involving customer purchases, we further outperform comparable methodologies, including a recent Bayesian approach for size recommendation.


Top Machine Learning Algorithms – Data Scientist Basic Tool Kit Vinod Sharma's Blog

#artificialintelligence

Machine Learning Algorithms – DataScientist may be the sexiest job of today but the understanding, implementation, applied ML experience is missing. Having the top algorithms on your fingertips in real business is missing big time. The real job for any data scientist is the ability to clarify, demonstrate, extract real values out of data and reap rewards. "Machine Learning" as a basic skill sounds like teleportation tool to many businesses especially for the companies which are actually data factories i.e social media platforms. Describing and picturising the top few machine learning algorithms is the main idea of this post.