Bayesian Learning
Mixture model modal clustering
The two most extended density-based approaches to clustering are surely mixture model clustering and modal clustering. In the mixture model approach, the density is represented as a mixture and clusters are associated to the different mixture components. In modal clustering, clusters are understood as regions of high density separated from each other by zones of lower density, so that they are closely related to certain regions around the density modes. If the true density is indeed in the assumed class of mixture densities, then mixture model clustering allows to scrutinize more subtle situations than modal clustering. However, when mixture modeling is used in a nonparametric way, taking advantage of the denseness of the sieve of mixture densities to approximate any density, then the correspondence between clusters and mixture components may become questionable. In this paper we introduce two methods to adopt a modal clustering point of view after a mixture model fit. Numerous examples are provided to illustrate that mixture modeling can also be used for clustering in a nonparametric sense, as long as clusters are understood as the domains of attraction of the density modes.
Sparse Tensor Graphical Model: Non-convex Optimization and Statistical Inference
Sun, Will Wei, Wang, Zhaoran, Lyu, Xiang, Liu, Han, Cheng, Guang
We consider the estimation and inference of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. To facilitate the estimation of the precision matrix corresponding to each way of the tensor, we assume the data follow a tensor normal distribution whose covariance has a Kronecker product structure. A critical challenge in the estimation and inference of this model is the fact that its penalized maximum likelihood estimation involves minimizing a non-convex objective function. To address it, this paper makes two contributions: (i) In spite of the non-convexity of this estimation problem, we prove that an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, attains an estimator with the optimal statistical rate of convergence. Notably, such an estimator achieves estimation consistency with only one tensor sample, which was not observed in the previous work. (ii) We propose a de-biased statistical inference procedure for testing hypotheses on the true support of the sparse precision matrices, and employ it for testing a growing number of hypothesis with false discovery rate (FDR) control. The asymptotic normality of our test statistic and the consistency of FDR control procedure are established. Our theoretical results are further backed up by thorough numerical studies. We implement the methods into a publicly available R package Tlasso.
Multilevel Monte Carlo for Scalable Bayesian Computations
Giles, Mike, Nagapetyan, Tigran, Szpruch, Lukasz, Vollmer, Sebastian, Zygalakis, Konstantinos
Markov chain Monte Carlo (MCMC) algorithms are ubiquitous in Bayesian computations. However, they need to access the full data set in order to evaluate the posterior density at every step of the algorithm. This results in a great computational burden in big data applications. In contrast to MCMC methods, Stochastic Gradient MCMC (SGMCMC) algorithms such as the Stochastic Gradient Langevin Dynamics (SGLD) only require access to a batch of the data set at every step. This drastically improves the computational performance and scales well to large data sets. However, the difficulty with SGMCMC algorithms comes from the sensitivity to its parameters which are notoriously difficult to tune. Moreover, the Root Mean Square Error (RMSE) scales as $\mathcal{O}(c^{-\frac{1}{3}})$ as opposed to standard MCMC $\mathcal{O}(c^{-\frac{1}{2}})$ where $c$ is the computational cost. We introduce a new class of Multilevel Stochastic Gradient Markov chain Monte Carlo algorithms that are able to mitigate the problem of tuning the step size and more importantly of recovering the $\mathcal{O}(c^{-\frac{1}{2}})$ convergence of standard Markov Chain Monte Carlo methods without the need to introduce Metropolis-Hasting steps. A further advantage of this new class of algorithms is that it can easily be parallelised over a heterogeneous computer architecture. We illustrate our methodology using Bayesian logistic regression and provide numerical evidence that for a prescribed relative RMSE the computational cost is sublinear in the number of data items.
Machine learning PREDICTIVE ANALYTICS REPORT Predictive Analytics
The Predictive Analytics Scores below – ordered on Forecasted Future Needs and Demand from High to Low – shows you Machine learning's Predictive Analysis. The link takes you to a corresponding product in The Art of Service's store to get started. The Art of Service's predictive model results enable businesses to discover and apply the most profitable technologies and applications, attracting the most profitable customers, and therefore helping maximize value from their investments. The Predictive Analytics algorithm evaluates and scores technologies and applications. The platform monitors over ten thousand technologies and applications for months, looking for interest swings in a topic, concept, technology or application, not just a count of mentions.
Bayesian Reinforcement Learning: A Survey
Ghavamzadeh, Mohammad, Mannor, Shie, Pineau, Joelle, Tamar, Aviv
Bayesian methods for machine learning have been widely investigated, yielding principled methods for incorporating prior information into inference algorithms. In this survey, we provide an in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm. The major incentives for incorporating Bayesian reasoning in RL are: 1) it provides an elegant approach to action-selection (exploration/exploitation) as a function of the uncertainty in learning; and 2) it provides a machinery to incorporate prior knowledge into the algorithms. We first discuss models and methods for Bayesian inference in the simple single-step Bandit model. We then review the extensive recent literature on Bayesian methods for model-based RL, where prior information can be expressed on the parameters of the Markov model. We also present Bayesian methods for model-free RL, where priors are expressed over the value function or policy class. The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.
Distributed Estimation of the Operating State of a Single-Bus DC MicroGrid without an External Communication Interface
Angjelichinoski, Marko, Scaglione, Anna, Popovski, Petar, Stefanovic, Cedomir
We propose a decentralized Maximum Likelihood solution for estimating the stochastic renewable power generation and demand in single bus Direct Current (DC) MicroGrids (MGs), with high penetration of droop controlled power electronic converters. The solution relies on the fact that the primary control parameters are set in accordance with the local power generation status of the generators. Therefore, the steady state voltage is inherently dependent on the generation capacities and the load, through a non-linear parametric model, which can be estimated. To have a well conditioned estimation problem, our solution avoids the use of an external communication interface and utilizes controlled voltage disturbances to perform distributed training. Using this tool, we develop an efficient, decentralized Maximum Likelihood Estimator (MLE) and formulate the sufficient condition for the existence of the globally optimal solution. The numerical results illustrate the promising performance of our MLE algorithm.
Deep learning in R PACKT Books
As the title suggests, in this article, we will be taking a look at some of the deep learning models in R. Some of the pioneering advancements in neural networks research in the last decade have opened up a new frontier in machine learning that is generally called by the name deep learning. The general definition of deep learning is, a class of machine learning techniques, where many layers of information processing stages in hierarchical supervised architectures are exploited for unsupervised feature learning and for pattern analysis/classification. The essence of deep learning is to compute hierarchical features or representations of the observational data, where the higher-level features or factors are defined from lower-level ones. Although there are many similar definitions and architectures for deep learning, two common elements in all of them are: multiple layers of nonlinear information processing and supervised or unsupervised learning of feature representations at each layer from the features learned at the previous layer. The initial works on deep learning were based on multilayer neural network models.
Finite-sample and asymptotic analysis of generalization ability with an application to penalized regression
Xu, Ning, Hong, Jian, Fisher, Timothy C. G.
In this paper, we study the performance of extremum estimators from the perspective of generalization ability (GA): the ability of a model to predict outcomes in new samples from the same population. By adapting the classical concentration inequalities, we derive upper bounds on the empirical out-of-sample prediction errors as a function of the in-sample errors, in-sample data size, heaviness in the tails of the error distribution, and model complexity. We show that the error bounds may be used for tuning key estimation hyper-parameters, such as the number of folds K in cross-validation. We also show how K affects the bias-variance tradeoff for cross-validation. Simulations are used to demonstrate key results. We would also like to acknowledge participants at the 12th International Symposium on Econometric Theory and Applications and the 26th New Zealand Econometric Study Group as well as seminar participants at Utah, UNSW, and University of Melbourne for useful questions and comments. Fisher would like to acknowledge the financial support of the Australian Research Council, grant DP0663477. 1 1 Introduction Traditionally in econometrics, an estimation method is implemented on sample data in order to infer patterns in a population. Put another way, inference centers on generalizing to the population the pattern learned from the sample and evaluating how well the sample pattern fits the population. An alternative perspective is to consider how well a sample pattern fits another sample. In this paper, we study the ability of a model estimated from a given sample to fit new samples from the same population, referred to as the generalization ability (GA) of the model. As a way of evaluating the external validity of sample estimates, the concept of GA has been implemented in recent empirical research. For example, in the policy evaluation literature [Belloni et al., 2013, Gechter, 2015, Dolton, 2006, Blundell et al., 2004], the central question is whether any treatment effect estimated from a pilot program can be generalized to out-of-sample individuals.
Noisy Inductive Matrix Completion Under Sparse Factor Models
Soni, Akshay, Chevalier, Troy, Jain, Swayambhoo
Inductive Matrix Completion (IMC) is an important class of matrix completion problems that allows direct inclusion of available features to enhance estimation capabilities. These models have found applications in personalized recommendation systems, multilabel learning, dictionary learning, etc. This paper examines a general class of noisy matrix completion tasks where the underlying matrix is following an IMC model i.e., it is formed by a mixing matrix (a priori unknown) sandwiched between two known feature matrices. The mixing matrix here is assumed to be well approximated by the product of two sparse matrices---referred here to as "sparse factor models." We leverage the main theorem of Soni:2016:NMC and extend it to provide theoretical error bounds for the sparsity-regularized maximum likelihood estimators for the class of problems discussed in this paper. The main result is general in the sense that it can be used to derive error bounds for various noise models. In this paper, we instantiate our main result for the case of Gaussian noise and provide corresponding error bounds in terms of squared loss.
On the Relationship between Online Gaussian Process Regression and Kernel Least Mean Squares Algorithms
Van Vaerenbergh, Steven, Fernandez-Bes, Jesus, Elvira, Víctor
ABSTRACT We study the relationship between online Gaussian process (GP) regression and kernel least mean squares (KLMS) algorithms. While the latter have no capacity of storing the entire posterior distribution during online learning, we discover that their operation corresponds to the assumption of a fixed posterior covariance that follows a simple parametric model. Interestingly, several well-known KLMS algorithms correspond to specific cases of this model. The probabilistic perspective allows us to understand how each of them handles uncertainty, which could explain some of their performance differences. Index Terms-- online learning, regression, Gaussian processes, kernel least-mean squares 1. INTRODUCTION Gaussian Process (GP) regression is a state-of-the-art Bayesian technique for nonlinear regression [1].