Education
Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
Smith, Andrew, Zha, Hongyuan, Wu, Xiao-ming
We study the convergence and the rate of convergence of a local manifold learning algorithm: LTSA [13]. The main technical tool is the perturbation analysis on the linear invariant subspace that corresponds to the solution of LTSA. We derive a worst-case upper bound of errors for LTSA which naturally leads to a convergence result. We then derive the rate of convergence for LTSA in a special case.
Translated Learning: Transfer Learning across Different Feature Spaces
Dai, Wenyuan, Chen, Yuqiang, Xue, Gui-rong, Yang, Qiang, Yu, Yong
This paper investigates a new machine learning strategy called translated learning. Unlike many previous learning tasks, we focus on how to use labeled data from one feature space to enhance the classification of other entirely different learning spaces. For example, we might wish to use labeled text data to help learn a model for classifying image data, when the labeled images are difficult to obtain. An important aspect of translated learning is to build a "bridge" to link one feature space (known as the "source space") to another space (known as the "target space") through a translator in order to migrate the knowledge from source to target. The translated learning solution uses a language model to link the class labels to the features in the source spaces, which in turn is translated to the features in the target spaces. Finally, this chain of linkages is completed by tracing back to the instances in the target spaces. We show that this path of linkage can be modeled using a Markov chain and risk minimization. Through experiments on the text-aided image classification and cross-language classification tasks, we demonstrate that our translated learning framework can greatly outperform many state-of-the-art baseline methods.
Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
Smith, Andrew, Zha, Hongyuan, Wu, Xiao-ming
We study the convergence and the rate of convergence of a local manifold learning algorithm: LTSA [13]. The main technical tool is the perturbation analysis on the linear invariant subspace that corresponds to the solution of LTSA. We derive a worst-case upper bound of errors for LTSA which naturally leads to a convergence result. We then derive the rate of convergence for LTSA in a special case.
Translated Learning: Transfer Learning across Different Feature Spaces
Dai, Wenyuan, Chen, Yuqiang, Xue, Gui-rong, Yang, Qiang, Yu, Yong
This paper investigates a new machine learning strategy called translated learning. Unlike many previous learning tasks, we focus on how to use labeled data from one feature space to enhance the classification of other entirely different learning spaces. For example, we might wish to use labeled text data to help learn a model for classifying image data, when the labeled images are difficult to obtain. An important aspect of translated learning is to build a "bridge" to link one feature space (known as the "source space") to another space (known as the "target space") through a translator in order to migrate the knowledge from source to target. The translated learning solution uses a language model to link the class labels to the features in the source spaces, which in turn is translated to the features in the target spaces. Finally, this chain of linkages is completed by tracing back to the instances in the target spaces. We show that this path of linkage can be modeled using a Markov chain and risk minimization. Through experiments on the text-aided image classification and cross-language classification tasks, we demonstrate that our translated learning framework can greatly outperform many state-of-the-art baseline methods.
Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control
Shpigelman, Lavi, Lalazar, Hagai, Vaadia, Eilon
Using machine learning algorithms to decode intended behavior from neural activity serves a dual purpose. First, these tools can be used to allow patients to interact with their environment through a Brain-Machine Interface (BMI). Second, analysis of the characteristics of such methods can reveal the significance of various features of neural activity, stimuli and responses to the encoding-decoding task. In this study we adapted, implemented and tested a machine learning method, called Kernel Auto-Regressive Moving Average (KARMA), for the task of inferring movements from neural activity in primary motor cortex. Our version of this algorithm is used in an on-line learning setting and is updated when feedback from the last inferred sequence become available. We first used it to track real hand movements executed by a monkey in a standard 3D motor control task. We then applied it in a closed-loop BMI setting to infer intended movement, while arms were restrained, allowing a monkey to perform the task using the BMI alone. KARMA is a recurrent method that learns a nonlinear model of output dynamics. It uses similarity functions (termed kernels) to compare between inputs. These kernels can be structured to incorporate domain knowledge into the method. We compare KARMA to various state-of-the-art methods by evaluating tracking performance and present results from the KARMA based BMI experiments.
Policy Search for Motor Primitives in Robotics
Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are high-dimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results into a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning by assuming a form of exploration that is particularly well-suited for dynamic motor primitives. The resulting algorithm is an EM-inspired algorithm applicable in complex motor learning tasks. We compare this algorithm to alternative parametrized policy search methods and show that it outperforms previous methods. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task using a real Barrett WAM robot arm.
Translated Learning: Transfer Learning across Different Feature Spaces
Dai, Wenyuan, Chen, Yuqiang, Xue, Gui-rong, Yang, Qiang, Yu, Yong
This paper investigates a new machine learning strategy called translated learning. Unlikemany previous learning tasks, we focus on how to use labeled data from one feature space to enhance the classification of other entirely different learning spaces. For example, we might wish to use labeled text data to help learn a model for classifying image data, when the labeled images are difficult to obtain. Animportant aspect of translated learning is to build a "bridge" to link one feature space (known as the "source space") to another space (known as the "target space")through a translator in order to migrate the knowledge from source to target. The translated learning solution uses a language model to link the class labels to the features in the source spaces, which in turn is translated to the features inthe target spaces. Finally, this chain of linkages is completed by tracing back to the instances in the target spaces. We show that this path of linkage can be modeled using a Markov chain and risk minimization. Through experiments on the text-aided image classification and cross-language classification tasks, we demonstrate that our translated learning framework can greatly outperform many state-of-the-art baseline methods.
Dual Averaging Method for Regularized Stochastic Learning and Online Optimization
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as L1-norm for sparsity. We develop a new online algorithm, the regularized dual averaging method, that can explicitly exploit the regularization structure in an online setting. In particular, at each iteration, the learning variables are adjusted by solving a simple optimization problem that involves the running average of all past subgradients of the loss functions and the whole regularization term, not just its subgradient. This method achieves the optimal convergence rate and often enjoys a low complexity per iteration similar as the standard stochastic gradient method. Computational experiments are presented for the special case of sparse online learning using L1-regularization.
Online Learning of Assignments
Streeter, Matthew, Golovin, Daniel, Krause, Andreas
Which ads should we display in sponsored search in order to maximize our revenue? How should we dynamically rank information sources to maximize value of information? These applications exhibit strong diminishing returns: Selection of redundant ads and information sources decreases their marginal utility. We show that these and other problems can be formalized as repeatedly selecting an assignment of items to positions to maximize a sequence of monotone submodular functions that arrive one by one. We present an efficient algorithm for this general problem and analyze it in the no-regret model. Our algorithm is equipped with strong theoretical guarantees, with a performance ratio that converges to the optimal constant of 1-1/e. We empirically evaluate our algorithms on two real-world online optimization problems on the web: ad allocation with submodular utilities, and dynamically ranking blogs to detect information cascades.
Nonparametric Greedy Algorithms for the Sparse Learning Problem
This paper studies the forward greedy strategy in sparse nonparametric regression. Foradditive models, we propose an algorithm called additive forward regression; forgeneral multivariate models, we propose an algorithm called generalized forward regression. Both algorithms simultaneously conduct estimation and variable selection in nonparametric settings for the high dimensional sparse learning problem. Our main emphasis is empirical: on both simulated and real data, these two simple greedy methods can clearly outperform several state-ofthe-art competitors,including LASSO, a nonparametric version of LASSO called the sparse additive model (SpAM) and a recently proposed adaptive parametric forward-backward algorithm called Foba. We also provide some theoretical justifications ofspecific versions of the additive forward regression.