Goto

Collaborating Authors

 Sun, Jianyong


Learning from Few Demonstrations with Frame-Weighted Motion Generation

arXiv.org Artificial Intelligence

Learning from Demonstration (LfD) enables robots to acquire versatile skills by learning motion policies from human demonstrations. It endows users with an intuitive interface to transfer new skills to robots without the need for time-consuming robot programming and inefficient solution exploration. During task executions, the robot motion is usually influenced by constraints imposed by environments. In light of this, task-parameterized LfD (TP-LfD) encodes relevant contextual information into reference frames, enabling better skill generalization to new situations. However, most TP-LfD algorithms typically require multiple demonstrations across various environmental conditions to ensure sufficient statistics for a meaningful model. It is not a trivial task for robot users to create different situations and perform demonstrations under all of them. Therefore, this paper presents a novel algorithm to learn skills from few demonstrations. By leveraging the reference frame weights that capture the frame importance or relevance during task executions, our method demonstrates excellent skill acquisition performance, which is validated in real robotic environments.


Amortized Variational Deep Q Network

arXiv.org Artificial Intelligence

Efficient exploration is one of the most important issues in deep reinforcement learning. To address this issue, recent methods consider the value function parameters as random variables, and resort variational inference to approximate the posterior of the parameters. In this paper, we propose an amortized variational inference framework to approximate the posterior distribution of the action value function in Deep Q Network. We establish the equivalence between the loss of the new model and the amortized variational inference loss. We realize the balance of exploration and exploitation by assuming the posterior as Cauchy and Gaussian, respectively in a two-stage training process. We show that the amortized framework can results in significant less learning parameters than existing state-of-the-art method. Experimental results on classical control tasks in OpenAI Gym and chain Markov Decision Process tasks show that the proposed method performs significantly better than state-of-art methods and requires much less training time.


On Hyper-parameter Tuning for Stochastic Optimization Algorithms

arXiv.org Machine Learning

This paper proposes the first-ever algorithmic framework for tuning hyper-parameters of stochastic optimization algorithm based on reinforcement learning. Hyper-parameters impose significant influences on the performance of stochastic optimization algorithms, such as evolutionary algorithms (EAs) and meta-heuristics. Yet, it is very time-consuming to determine optimal hyper-parameters due to the stochastic nature of these algorithms. We propose to model the tuning procedure as a Markov decision process, and resort the policy gradient algorithm to tune the hyper-parameters. Experiments on tuning stochastic algorithms with different kinds of hyper-parameters (continuous and discrete) for different optimization problems (continuous and discrete) show that the proposed hyper-parameter tuning algorithms do not require much less running times of the stochastic algorithms than bayesian optimization method. The proposed framework can be used as a standard tool for hyper-parameter tuning in stochastic algorithms.


Homotopic Convex Transformation: A New Method to Smooth the Landscape of the Traveling Salesman Problem

arXiv.org Artificial Intelligence

This paper proposes a novel landscape smoothing method for the symmetric Traveling Salesman Problem (TSP). We first define the homotopic convex (HC) transformation of a TSP as a convex combination of a well-constructed simple TSP and the original TSP. We observe that controlled by the coefficient of the convex combination, (i) the landscape of the HC transformed TSP is smoothed in terms that its number of local optima is reduced compared to the original TSP; (ii) the fitness distance correlation of the HC transformed TSP is increased. We then propose an iterative algorithmic framework in which the proposed HC transformation is combined with a heuristic TSP solver. It works as an escaping scheme from local optima for improving the global search ability of the combined heuristic. A case study with the 3-Opt local search as the heuristic solver shows that the resultant algorithm significantly outperforms iterated local search and two other smoothing-based TSP heuristic solvers on most of commonly-used test instances.


From Adaptive Kernel Density Estimation to Sparse Mixture Models

arXiv.org Machine Learning

We introduce a balloon estimator in a generalized expectation-maximization method for estimating all parameters of a Gaussian mixture model given one data sample per mixture component. Instead of limiting explicitly the model size, this regularization strategy yields low-complexity sparse models where the number of effective mixture components reduces with an increase of a smoothing probability parameter $\mathbf{P>0}$. This semi-parametric method bridges from non-parametric adaptive kernel density estimation (KDE) to parametric ordinary least-squares when $\mathbf{P=1}$. Experiments show that simpler sparse mixture models retain the level of details present in the adaptive KDE solution.