Goto

Collaborating Authors

 Education


Learning Manifolds with K-Means and K-Flats

arXiv.org Machine Learning

Our study is broadly motivated by questions in high-dimensional learning. As is well known, learning in high dimensions is feasible only if the data distribution satisfies suitable prior assumptions. One such assumption is that the data distribution lies on, or is close to, a low-dimensional set embedded in a high dimensional space, for instance a low dimensional manifold. This latter assumption has proved to be useful in practice, as well as amenable to theoretical analysis, and it has led to a significant amount of recent work. Starting from [29, 40, 7], this set of ideas, broadly referred to as manifold learning, has been applied to a variety of problems from supervised [42] and semi-supervised learning [8], to clustering [45] and dimensionality reduction [7], to name a few. Interestingly, the problem of learning the manifold itself has received less attention: given samples from a d-manifold M embedded in some ambient space X, the problem is to learn a set that approximates M in a suitable sense. This problem has been considered in computational geometry, but in a setting in which typically the manifold is a hyper-surface in a low-dimensional space (e.g.


No More Pesky Learning Rates

arXiv.org Machine Learning

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of SGD or other adaptive approaches with their best settings obtained through systematic search, and effectively removes the need for learning rate tuning.


Layer-wise learning of deep generative models

arXiv.org Machine Learning

When using deep, multi-layered architectures to build generative models of data, it is difficult to train all layers at once. We propose a layer-wise training procedure admitting a performance guarantee compared to the global optimum. It is based on an optimistic proxy of future performance, the best latent marginal. We interpret auto-encoders in this setting as generative models, by showing that they train a lower bound of this criterion. We test the new learning procedure against a state of the art method (stacked RBMs), and find it to improve performance. Both theory and experiments highlight the importance, when training deep architectures, of using an inference model (from data to hidden variables) richer than the generative model (from hidden variables to data).


Competing With Strategies

arXiv.org Machine Learning

We study the problem of online learning with a notion of regret defined with respect to a set of strategies. We develop tools for analyzing the minimax rates and for deriving regret-minimization algorithms in this scenario. While the standard methods for minimizing the usual notion of regret fail, through our analysis we demonstrate existence of regret-minimization methods that compete with such sets of strategies as: autoregressive algorithms, strategies based on statistical models, regularized least squares, and follow the regularized leader strategies. In several cases we also derive efficient learning algorithms.


Complexity distribution of agent policies

arXiv.org Artificial Intelligence

We analyse the complexity of environments according to the policies that need to be used to achieve high performance. The performance results for a population of policies leads to a distribution that is examined in terms of policy complexity and analysed through several diagrams and indicators. The notion of environment response curve is also introduced, by inverting the performance results into an ability scale. We apply all these concepts, diagrams and indicators to a minimalistic environment class, agent-populated elementary cellular automata, showing how the difficulty, discriminating power and ranges (previous to normalisation) may vary for several environments.


Update Rules for Parameter Estimation in Bayesian Networks

arXiv.org Machine Learning

This paper re-examines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in on-line learning [Kivinen & Warmuth, 1994]. We provide a unified framework for parameter estimation that encompasses both on-line learning, where the model is continuously adapted to new data cases as they arrive, and the more traditional batch learning, where a pre-accumulated set of samples is used in a one-time model selection process. In the batch case, our framework encompasses both the gradient projection algorithm and the EM algorithm for Bayesian networks. The framework also leads to new on-line and batch parameter update schemes, including a parameterized version of EM. We provide both empirical and theoretical results indicating that parameterized EM allows faster convergence to the maximum likelihood parameters than does standard EM.


Sequential Update of Bayesian Network Structure

arXiv.org Artificial Intelligence

There is an obvious need for improving the performance and accuracy of a Bayesian network as new data is observed. Because of errors in model construction and changes in the dynamics of the domains, we cannot afford to ignore the information in new data. While sequential update of parameters for a fixed structure can be accomplished using standard techniques, sequential update of network structure is still an open problem. In this paper, we investigate sequential update of Bayesian networks were both parameters and structure are expected to change. We introduce a new approach that allows for the flexible manipulation of the tradeoff between the quality of the learned networks and the amount of information that is maintained about past observations. We formally describe our approach including the necessary modifications to the scoring functions for learning Bayesian networks, evaluate its effectiveness through and empirical study, and extend it to the case of missing data.


Link prediction for partially observed networks

arXiv.org Machine Learning

Link prediction is one of the fundamental problems in network analysis. In many applications, notably in genetics, a partially observed network may not contain any negative examples of absent edges, which creates a difficulty for many existing supervised learning approaches. We develop a new method which treats the observed network as a sample of the true network with different sampling rates for positive and negative examples. We obtain a relative ranking of potential links by their probabilities, utilizing information on node covariates as well as on network topology. Empirically, the method performs well under many settings, including when the observed network is sparse. We apply the method to a protein-protein interaction network and a school friendship network.


Multi-class Generalized Binary Search for Active Inverse Reinforcement Learning

arXiv.org Artificial Intelligence

This paper addresses the problem of learning a task from demonstration. We adopt the framework of inverse reinforcement learning, where tasks are represented in the form of a reward function. Our contribution is a novel active learning algorithm that enables the learning agent to query the expert for more informative demonstrations, thus leading to more sample-efficient learning. For this novel algorithm (Generalized Binary Search for Inverse Reinforcement Learning, or GBS-IRL), we provide a theoretical bound on sample complexity and illustrate its applicability on several different tasks. To our knowledge, GBS-IRL is the first active IRL algorithm with provable sample complexity bounds. We also discuss our method in light of other existing methods in the literature and its general applicability in multi-class classification problems. Finally, motivated by recent work on learning from demonstration in robots, we also discuss how different forms of human feedback can be integrated in a transparent manner in our learning framework.


Bayes Nets in Educational Assessment: Where Do the Numbers Come From?

arXiv.org Artificial Intelligence

As observations and student models become complex, educational assessments that exploit advances in technology and cognitive psychology can outstrip familiar testing models and analytic methods. Within the Portal conceptual framework for assessment design, Bayesian inference networks (BINs) record beliefs about students' knowledge and skills, in light of what they say and do. Joining evidence model BIN fragments- which contain observable variables and pointers to student model variables - to the student model allows one to update belief about knowledge and skills as observations arrive. Markov Chain Monte Carlo (MCMC) techniques can estimate the required conditional probabilities from empirical data, supplemented by expert judgment or substantive theory. Details for the special cases of item response theory (IRT) and multivariate latent class modeling are given, with a numerical example of the latter.