Country
Vector-valued Reproducing Kernel Banach Spaces with Applications to Multi-task Learning
Motivated by multi-task machine learning with Banach spaces, we propose the notion of vector-valued reproducing kernel Banach spaces (RKBS). Basic properties of the spaces and the associated reproducing kernels are investigated. We also present feature map constructions and several concrete examples of vector-valued RKBS. The theory is then applied to multi-task machine learning. Especially, the representer theorem and characterization equations for the minimizer of regularized learning schemes in vector-valued RKBS are established.
BAMBI: blind accelerated multimodal Bayesian inference
Graff, Philip, Feroz, Farhan, Hobson, Michael P., Lasenby, Anthony
In this paper we present an algorithm for rapid Bayesian analysis that combines the benefits of nested sampling and artificial neural networks. The blind accelerated multimodal Bayesian inference (BAMBI) algorithm implements the MultiNest package for nested sampling as well as the training of an artificial neural network (NN) to learn the likelihood function. In the case of computationally expensive likelihoods, this allows the substitution of a much more rapid approximation in order to increase significantly the speed of the analysis. We begin by demonstrating, with a few toy examples, the ability of a NN to learn complicated likelihood surfaces. BAMBI's ability to decrease running time for Bayesian inference is then demonstrated in the context of estimating cosmological parameters from Wilkinson Microwave Anisotropy Probe and other observations. We show that valuable speed increases are achieved in addition to obtaining NNs trained on the likelihood functions for the different model and data combinations. These NNs can then be used for an even faster follow-up analysis using the same likelihood and different priors. This is a fully general algorithm that can be applied, without any pre-processing, to other problems with computationally expensive likelihood functions.
Extended Mixture of MLP Experts by Hybrid of Conjugate Gradient Method and Modified Cuckoo Search
Salimi, Hamid, Giveki, Davar, Soltanshahi, Mohammad Ali, Hatami, Javad
This paper investigates a new method for improving the learning algorithm of Mixture of Experts (ME) model using a hybrid of Modified Cuckoo Search (MCS) and Conjugate Gradient (CG) as a second order optimization technique. The CG technique is combined with Back-Propagation (BP) algorithm to yield a much more efficient learning algorithm for ME structure. In addition, the experts and gating networks in enhanced model are replaced by CG based Multi-Layer Perceptrons (MLPs) to provide faster and more accurate learning. The CG is considerably depends on initial weights of connections of Artificial Neural Network (ANN), so, a metaheuristic algorithm, the so-called Modified Cuckoo Search is applied in order to select the optimal weights. The performance of proposed method is compared with Gradient Decent Based ME (GDME) and Conjugate Gradient Based ME (CGME) in classification and regression problems. The experimental results show that hybrid MSC and CG based ME (MCS-CGME) has faster convergence and better performance in utilized benchmark data sets.
The Future of Search and Discovery in Big Data Analytics: Ultrametric Information Spaces
Murtagh, Fionn, Contreras, Pedro
Consider observation data, comprised of n observation vectors with values on a set of attributes. This gives us n points in attribute space. Having data structured as a tree, implied by having our observations embedded in an ultrametric topology, offers great advantage for proximity searching. If we have preprocessed data through such an embedding, then an observation's nearest neighbor is found in constant computational time, i.e. O(1) time. A further powerful approach is discussed in this work: the inducing of a hierarchy, and hence a tree, in linear computational time, i.e. O(n) time for n observations. It is with such a basis for proximity search and best match that we can address the burgeoning problems of processing very large, and possibly also very high dimensional, data sets.
Learning mixed graphical models from data with p larger than n
Structure learning of Gaussian graphical models is an extensively studied problem in the classical multivariate setting where the sample size n is larger than the number of random variables p, as well as in the more challenging setting when p>>n. However, analogous approaches for learning the structure of graphical models with mixed discrete and continuous variables when p>>n remain largely unexplored. Here we describe a statistical learning procedure for this problem based on limited-order correlations and assess its performance with synthetic and real data.
PAC-Bayesian Policy Evaluation for Reinforcement Learning
Fard, Mahdi MIlani, Pineau, Joelle, Szepesvari, Csaba
Bayesian priors offer a compact yet general means of incorporating domain knowledge into many learning tasks. The correctness of the Bayesian analysis and inference, however, largely depends on accuracy and correctness of these priors. PAC-Bayesian methods overcome this problem by providing bounds that hold regardless of the correctness of the prior distribution. This paper introduces the first PAC-Bayesian bound for the batch reinforcement learning problem with function approximation. We show how this bound can be used to perform model-selection in a transfer learning scenario. Our empirical results confirm that PAC-Bayesian policy evaluation is able to leverage prior distributions when they are informative and, unlike standard Bayesian RL approaches, ignore them when they are misleading.
Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions
Poczos, Barnabas, Xiong, Liang, Schneider, Jeff
Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. Here we consider a different setting. We assume that each instance corresponds to a continuous probability distribution. These distributions are unknown, but we are given some i.i.d. samples from each distribution. Our goal is to estimate the distances between these distributions and use these distances to perform low-dimensional embedding, clustering/classification, or anomaly detection for the distributions. We present estimation algorithms, describe how to apply them for machine learning tasks on distributions, and show empirical results on synthetic data, real word images, and astronomical data sets.
Efficient Probabilistic Inference with Partial Ranking Queries
Huang, Jonathan, Kapoor, Ashish, Guestrin, Carlos E.
Distributions over rankings are used to model data in various settings such as preference analysis and political elections. The factorial size of the space of rankings, however, typically forces one to make structural assumptions, such as smoothness, sparsity, or probabilistic independence about these underlying distributions. We approach the modeling problem from the computational principle that one should make structural assumptions which allow for efficient calculation of typical probabilistic queries. For ranking models, "typical" queries predominantly take the form of partial ranking queries (e.g., given a user's top-k favorite movies, what are his preferences over remaining movies?). In this paper, we argue that riffled independence factorizations proposed in recent literature [7, 8] are a natural structural assumption for ranking distributions, allowing for particularly efficient processing of partial ranking queries.
Robust learning Bayesian networks for prior belief
Recent reports have described that learning Bayesian networks are highly sensitive to the chosen equivalent sample size (ESS) in the Bayesian Dirichlet equivalence uniform (BDeu). This sensitivity often engenders some unstable or undesirable results. This paper describes some asymptotic analyses of BDeu to explain the reasons for the sensitivity and its effects. Furthermore, this paper presents a proposal for a robust learning score for ESS by eliminating the sensitive factors from the approximation of log-BDeu.
Smoothing Multivariate Performance Measures
Zhang, Xinhua, Saha, Ankan, Vishwanatan, S. V. N.
A Support Vector Method for multivariate performance measures was recently introduced by Joachims (2005). The underlying optimization problem is currently solved using cutting plane methods such as SVM-Perf and BMRM. One can show that these algorithms converge to an eta accurate solution in O(1/Lambda*e) iterations, where lambda is the trade-off parameter between the regularizer and the loss function. We present a smoothing strategy for multivariate performance scores, in particular precision/recall break-even point and ROCArea. When combined with Nesterov's accelerated gradient algorithm our smoothing strategy yields an optimization algorithm which converges to an eta accurate solution in O(min{1/e,1/sqrt(lambda*e)}) iterations. Furthermore, the cost per iteration of our scheme is the same as that of SVM-Perf and BMRM. Empirical evaluation on a number of publicly available datasets shows that our method converges significantly faster than cutting plane methods without sacrificing generalization ability.