Education
Fast biped walking with a reflexive controller and real-time policy searching
Geng, Tao, Porr, Bernd, Wörgötter, Florentin
The goal of this study is to combine neuronal mechanisms with biomechanics to obtain very fast speed and the online learning of circuit parameters. Our controller isbuilt with biologically inspired sensor-and motor-neuron models, including local reflexes and not employing any kind of position or trajectory-tracking control algorithm. Instead, this reflexive controller allows RunBot to exploit its own natural dynamics during critical stages of its walking gait cycle. To our knowledge, this is the first time that dynamic bipedwalking is achieved using only a pure reflexive controller. In addition, this structure allows using a policy gradient reinforcement learning algorithm to tune the parameters of the reflexive controller in real-time during walking. This way RunBot can reach a relative speed of 3.5 leg-lengths per second after a few minutes of online learning, which is faster than that of any other biped robot, and is also comparable to the fastest relative speed of human walking. In addition, the stability domain of stable walking is quite large supporting this design strategy.
Distance Metric Learning for Large Margin Nearest Neighbor Classification
Weinberger, Kilian Q., Blitzer, John, Saul, Lawrence K.
We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN)classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification--for example, achieving a test error rate of 1.3% on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification.
Goal-Based Imitation as Probabilistic Inference over Graphical Models
Humans are extremely adept at learning new skills by imitating the actions ofothers. A progression of imitative abilities has been observed in children, ranging from imitation of simple body movements to goalbased imitationbased on inferring intent. In this paper, we show that the problem of goal-based imitation can be formulated as one of inferring goals and selecting actions using a learned probabilistic graphical model of the environment. We first describe algorithms for planning actions to achieve a goal state using probabilistic inference. We then describe how planning can be used to bootstrap the learning of goal-dependent policies byutilizing feedback from the environment. The resulting graphical model is then shown to be powerful enough to allow goal-based imitation. Usinga simple maze navigation task, we illustrate how an agent can infer the goals of an observed teacher and imitate the teacher even when the goals are uncertain and the demonstration is incomplete.
Selecting Landmark Points for Sparse Manifold Learning
Silva, Jorge, Marques, Jorge, Lemos, João
There has been a surge of interest in learning nonlinear manifold models to approximate high-dimensional data. Both for computational complexity reasonsand for generalization capability, sparsity is a desired feature in such models. This usually means dimensionality reduction, which naturally implies estimating the intrinsic dimension, but it can also mean selecting a subset of the data to use as landmarks, which is especially important becausemany existing algorithms have quadratic complexity in the number of observations.
Radial Basis Function Network for Multi-task Learning
We extend radial basis function (RBF) networks to the scenario in which multiple correlated tasks are learned simultaneously, and present the corresponding learningalgorithms. We develop the algorithms for learning the network structure, in either a supervised or unsupervised manner. Training data may also be actively selected to improve the network's generalization totest data. Experimental results based on real data demonstrate the advantage of the proposed algorithms and support our conclusions.
Worst-Case Bounds for Gaussian Process Models
Kakade, Sham M., Seeger, Matthias W., Foster, Dean P.
Dean P. Foster University of Pennsylvania We present a competitive analysis of some nonparametric Bayesian algorithms ina worst-case online learning setting, where no probabilistic assumptions about the generation of the data are made. We consider models which use a Gaussian process prior (over the space of all functions) andprovide bounds on the regret (under the log loss) for commonly usednon-parametric Bayesian algorithms -- including Gaussian regression and logistic regression -- which show how these algorithms can perform favorably under rather general conditions.
From Batch to Transductive Online Learning
Kakade, Sham, Kalai, Adam Tauman
It is well-known that everything that is learnable in the difficult online setting, where an arbitrary sequences of examples must be labeled one at a time, is also learnable in the batch setting, where examples are drawn independently from a distribution. We show a result in the opposite direction. Wegive an efficient conversion algorithm from batch to online that is transductive: it uses future unlabeled data. This demonstrates the equivalence between what is properly and efficiently learnable in a batch model and a transductive online model.
Data-Driven Online to Batch Conversions
Online learning algorithms are typically fast, memory efficient, and simple toimplement. However, many common learning problems fit more naturally in the batch learning setting. The power of online learning algorithms can be exploited in batch settings by using online-to-batch conversions techniques which build a new batch algorithm from an existing onlinealgorithm. We first give a unified overview of three existing online-to-batch conversion techniques which do not use training data in the conversion process. We then build upon these data-independent conversions to derive and analyze data-driven conversions.
The Forgetron: A Kernel-Based Perceptron on a Fixed Budget
Dekel, Ofer, Shalev-shwartz, Shai, Singer, Yoram
The Perceptron algorithm, despite its simplicity, often performs well on online classification tasks. The Perceptron becomes especially effective when it is used in conjunction with kernels. However, a common difficulty encounteredwhen implementing kernel-based online algorithms is the amount of memory required to store the online hypothesis, which may grow unboundedly. In this paper we present and analyze the Forgetron algorithmfor kernel-based online learning on a fixed memory budget. To our knowledge, this is the first online learning algorithm which, on one hand, maintains a strict limit on the number of examples it stores while, on the other hand, entertains a relative mistake bound. In addition to the formal results, we also present experiments with real datasets which underscore the merits of our approach.
From Weighted Classification to Policy Search
This paper proposes an algorithm to convert a T -stage stochastic decision problem with a continuous state space to a sequence of supervised learning problems.The optimization problem associated with the trajectory tree and random trajectory methods of Kearns, Mansour, and Ng, 2000, is solved using the Gauss-Seidel method. The algorithm breaks a multistage reinforcementlearning problem into a sequence of single-stage reinforcement learningsubproblems, each of which is solved via an exact reduction to a weighted-classification problem that can be solved using off-the-self methods. Thus the algorithm converts a reinforcement learning probleminto simpler supervised learning subproblems. It is shown that the method converges in a finite number of steps to a solution that cannot be further improved by componentwise optimization. The implication ofthe proposed algorithm is that a plethora of classification methods can be applied to find policies in the reinforcement learning problem.