Goto

Collaborating Authors

 Genre


Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization

arXiv.org Machine Learning

Relative to the large literature on upper bounds on complexity of convex optimization, lesser attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining an understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for various function classes.


Joint Modeling of Multiple Related Time Series via the Beta Process

arXiv.org Machine Learning

We propose a Bayesian nonparametric approach to the problem of jointly modeling multiple related time series. Our approach is based on the discovery of a set of latent, shared dynamical behaviors. Using a beta process prior, the size of the set and the sharing pattern are both inferred from data. We develop efficient Markov chain Monte Carlo methods based on the Indian buffet process representation of the predictive distribution of the beta process, without relying on a truncated model. In particular, our approach uses the sum-product algorithm to efficiently compute Metropolis-Hastings acceptance probabilities, and explores new dynamical behaviors via birth and death proposals. We examine the benefits of our proposed feature-based model on several synthetic datasets, and also demonstrate promising results on unsupervised segmentation of visual motion capture data.


Krylov Subspace Descent for Deep Learning

arXiv.org Machine Learning

Daniel Povey Microsoft Research One Microsoft Way Redmond, WA 98052 In this paper, we propose a second order optimization method to learn models where both the dimensionality of the parameter space and the number of training samples is high. In our method, we construct on each iteration a Krylov subspace formed by the gradient and an approximation to the Hessian matrix, and then use a subset of the training data samples to optimize over this subspace. As with the Hessian Free (HF) method of [7], the Hessian matrix is never explicitly constructed, and is computed using a subset of data. In practice, as in HF, we typically use a positive definite substitute for the Hessian matrix such as the Gauss-Newton matrix. We investigate the effectiveness of our proposed method on deep neural networks, and compare its performance to widely used methods such as stochastic gradient descent, conjugate gradient descent and L-BFGS, and also to HF. Our method leads to faster convergence than either L-BFGS or HF, and generally performs better than either of them in cross-validation accuracy. It is also simpler and more general than HF, as it does not require a positive semi-definite approximation of the Hessian matrix to work well nor the setting of a damping parameter. The chief drawback versus HF is the need for memory to store a basis for the Krylov subspace.


Analog Sparse Approximation with Applications to Compressed Sensing

arXiv.org Machine Learning

Recent research has shown that performance in signal processing tasks can often be significantly improved by using signal models based on sparse representations, where a signal is approximated using a small number of elements from a fixed dictionary. Unfortunately, inference in this model involves solving non-smooth optimization problems that are computationally expensive. While significant efforts have focused on developing digital algorithms specifically for this problem, these algorithms are inappropriate for many applications because of the time and power requirements necessary to solve large optimization problems. Based on recent work in computational neuroscience, we explore the potential advantages of continuous time dynamical systems for solving sparse approximation problems if they were implemented in analog VLSI. Specifically, in the simulated task of recovering synthetic and MRI data acquired via compressive sensing techniques, we show that these systems can potentially perform recovery at time scales of 10-20{\mu}s, supporting datarates of 50-100 kHz (orders of magnitude faster that digital algorithms). Furthermore, we show analytically that a wide range of sparse approximation problems can be solved in the same basic architecture, including approximate $\ell^p$ norms, modified $\ell^1$ norms, re-weighted $\ell^1$ and $\ell^2$, the block $\ell^1$ norm and classic Tikhonov regularization.


Control Neuronal por Modelo Inverso de un Servosistema Usando Algoritmos de Aprendizaje Levenberg-Marquardt y Bayesiano

arXiv.org Artificial Intelligence

In this paper we present the experimental results of the neural network control of a servo-system in order to control its speed. The control strategy is implemented by using an inverse-model control based on Artificial Neural Networks (ANNs). The network training was performed using two learning algorithms: Levenberg-Marquardt and Bayesian regularization. We evaluate the generalization capability for each method according to both the correct operation of the controller to follow the reference signal, and the control efforts developed by the ANN-based controller.


Bayesian multitask inverse reinforcement learning

arXiv.org Artificial Intelligence

We generalise the problem of inverse reinforcement learning to multiple tasks, from multiple demonstrations. Each one may represent one expert trying to solve a different task, or as different experts trying to solve the same task. Our main contribution is to formalise the problem as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. In doing so, we introduce a prior on policy optimality, which is more natural to specify. We show that our framework allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and learning from multiple teachers.


Fast Learning Rate of Non-Sparse Multiple Kernel Learning and Optimal Regularization Strategies

arXiv.org Machine Learning

In this paper, we give a new generalization error bound of Multiple Kernel Learning (MKL) for a general class of regularizations, and discuss what kind of regularization gives a favorable predictive accuracy. Our main target in this paper is dense type regularizations including \ellp-MKL. According to the recent numerical experiments, the sparse regularization does not necessarily show a good performance compared with dense type regularizations. Motivated by this fact, this paper gives a general theoretical tool to derive fast learning rates of MKL that is applicable to arbitrary mixed-norm-type regularizations in a unifying manner. This enables us to compare the generalization performances of various types of regularizations. As a consequence, we observe that the homogeneity of the complexities of candidate reproducing kernel Hilbert spaces (RKHSs) affects which regularization strategy (\ell1 or dense) is preferred. In fact, in homogeneous complexity settings where the complexities of all RKHSs are evenly same, \ell1-regularization is optimal among all isotropic norms. On the other hand, in inhomogeneous complexity settings, dense type regularizations can show better learning rate than sparse \ell1-regularization. We also show that our learning rate achieves the minimax lower bound in homogeneous complexity settings.


A Bayesian Model for Plan Recognition in RTS Games applied to StarCraft

arXiv.org Artificial Intelligence

The task of keyhole (unobtrusive) plan recognition is central to adaptive game AI. "Tech trees" or "build trees" are the core of real-time strategy (RTS) game strategic (long term) planning. This paper presents a generic and simple Bayesian model for RTS build tree prediction from noisy observations, which parameters are learned from replays (game logs). This unsupervised machine learning approach involves minimal work for the game developers as it leverage players' data (com- mon in RTS). We applied it to StarCraft1 and showed that it yields high quality and robust predictions, that can feed an adaptive AI.


New Candidates Welcome! Possible Winners with respect to the Addition of New Candidates

arXiv.org Artificial Intelligence

In voting contexts, some new candidates may show up in the course of the process. In this case, we may want to determine which of the initial candidates are possible winners, given that a fixed number $k$ of new candidates will be added. We give a computational study of this problem, focusing on scoring rules, and we provide a formal comparison with related problems such as control via adding candidates or cloning.


Learning to Make Predictions In Partially Observable Environments Without a Generative Model

Journal of Artificial Intelligence Research

When faced with the problem of learning a model of a high-dimensional environment, a common approach is to limit the model to make only a restricted set of predictions, thereby simplifying the learning problem. These partial models may be directly useful for making decisions or may be combined together to form a more complete, structured model. However, in partially observable (non-Markov) environments, standard model-learning methods learn generative models, i.e. models that provide a probability distribution over all possible futures (such as POMDPs). It is not straightforward to restrict such models to make only certain predictions, and doing so does not always simplify the learning problem. In this paper we present prediction profile models: non-generative partial models for partially observable systems that make only a given set of predictions, and are therefore far simpler than generative models in some cases. We formalize the problem of learning a prediction profile model as a transformation of the original model-learning problem, and show empirically that one can learn prediction profile models that make a small set of important predictions even in systems that are too complex for standard generative models.