Chang, Dong Eui
Machine learning based state observer for discrete time systems evolving on Lie groups
Shanbhag, Soham, Chang, Dong Eui
In this paper, a machine learning based observer for systems evolving on manifolds is designed such that the state of the observer is restricted to the Lie group on which the system evolves. Conventional techniques involving machine learning based observers on systems evolving on Lie groups involve designing charts for the Lie group, training a machine learning based observer for each chart, and switching between the trained models based on the state of the system. We propose a novel deep learning based technique whose predictions are restricted to a measure 0 subset of Euclidean space without using charts. Using this network, we design an observer ensuring that the state of the observer is restricted to the Lie group, and predicting the state using only one trained algorithm. The deep learning network predicts an ``error term'' on the Lie algebra of the Lie group, uses the map from the Lie algebra to the group, and uses the group action and the present state to estimate the state at the next epoch. This model being purely data driven does not require the model of the system. The proposed algorithm provides a novel framework for constraining the output of machine learning networks to a measure 0 subset of a Euclidean space without chart specific training and without requiring switching. We show the validity of this method using Monte Carlo simulations performed of the rigid body rotation and translation system.
Unscented Kalman filter with stable embedding for simple, accurate and computationally efficient state estimation of systems on manifolds in Euclidean space
Park, Jae-Hyeon, Chang, Dong Eui
This paper proposes a simple, accurate and computationally efficient method to apply the ordinary unscented Kalman filter developed in Euclidean space to systems whose dynamics evolve on manifolds.We use the mathematical theory called stable embedding to make a variant of unscented Kalman filter that keeps state estimates in closeproximity to the manifold while exhibiting excellent estimation performance. We confirm the performance of our devised filter by applying it to the satellite system model and comparing the performance with other unscented Kalman filters devised specifically for systems on manifolds. Our devised filter has a low estimation error, keeps the state estimates in close proximity to the manifold as expected, and consumes a minor amount of computation time. Also our devised filter is simple and easy to use because our filter directly employs the off-the-shelf standard unscented Kalman filter devised in Euclidean space without any particular manifold-structure-preserving discretization method or coordinate transformation.
Feedback Gradient Descent: Efficient and Stable Optimization with Orthogonality for DNNs
Bu, Fanchen, Chang, Dong Eui
The optimization with orthogonality has been shown useful in training deep neural networks (DNNs). To impose orthogonality on DNNs, both computational efficiency and stability are important. However, existing methods utilizing Riemannian optimization or hard constraints can only ensure stability while those using soft constraints can only improve efficiency. In this paper, we propose a novel method, named Feedback Gradient Descent (FGD), to our knowledge, the first work showing high efficiency and stability simultaneously. FGD induces orthogonality based on the simple yet indispensable Euler discretization of a continuous-time dynamical system on the tangent bundle of the Stiefel manifold. In particular, inspired by a numerical integration method on manifolds called Feedback Integrators, we propose to instantiate it on the tangent bundle of the Stiefel manifold for the first time. In the extensive image classification experiments, FGD comprehensively outperforms the existing state-of-the-art methods in terms of accuracy, efficiency, and stability.
Robust Navigation for Racing Drones based on Imitation Learning and Modularization
Wang, Tianqi, Chang, Dong Eui
This paper presents a vision-based modularized drone racing navigation system that uses a customized convolutional neural network (CNN) for the perception module to produce high-level navigation commands and then leverages a state-of-the-art planner and controller to generate low-level control commands, thus exploiting the advantages of both data-based and model-based approaches. Unlike the state-of-the-art method which only takes the current camera image as the CNN input, we further add the latest three drone states as part of the inputs. Our method outperforms the state-of-the-art method in various track layouts and offers two switchable navigation behaviors with a single trained network. The CNN-based perception module is trained to imitate an expert policy that automatically generates ground truth navigation commands based on the pre-computed global trajectories. Owing to the extensive randomization and our modified dataset aggregation (DAgger) policy during data collection, our navigation system, which is purely trained in simulation with synthetic textures, successfully operates in environments with randomly-chosen photorealistic textures without further fine-tuning.
The Adaptive Dynamic Programming Toolbox
Xing, Xiaowei, Chang, Dong Eui
The paper develops the Adaptive Dynamic Programming Toolbox (ADPT), which solves optimal control problems for continuous-time nonlinear systems. Based on the adaptive dynamic programming technique, the ADPT computes optimal feedback controls from the system dynamics in the model-based working mode, or from measurements of trajectories of the system in the model-free working mode without the requirement of knowledge of the system model. Multiple options are provided such that the ADPT can accommodate various customized circumstances. Compared to other popular software toolboxes for optimal control, the ADPT enjoys its computational precision and speed, which is illustrated with its applications to a satellite attitude control problem.
Double Prioritized State Recycled Experience Replay
Bu, Fanchen, Chang, Dong Eui
Experience replay enables online reinforcement learning agents to store and reuse the previous experiences of interacting with the environment. In the original method, the experiences are sampled and replayed uniformly at random. A prior work called prioritized experience replay was developed where experiences are prioritized, so as to replay experiences seeming to be more important more frequently. In this paper, we develop a method called double-prioritized state-recycled (DPSR) experience replay, prioritizing the experiences in both training stage and storing stage, as well as replacing the experiences in the memory with state recycling to make the best of experiences that seem to have low priorities temporarily. We used this method in Deep Q-Networks (DQN), and achieved a state-of-the-art result, outperforming the original method and prioritized experience replay on many Atari games.
Improved Reinforcement Learning through Imitation Learning Pretraining Towards Image-based Autonomous Driving
Wang, Tianqi, Chang, Dong Eui
We present a training pipeline for the autonomous driving task given the current camera image and vehicle speed as the input to produce the throttle, brake, and steering control output. The simulator Airsim's convenient weather and lighting API provides a sufficient diversity during training which can be very helpful to increase the trained policy's robustness. In order to not limit the possible policy's performance, we use a continuous and deterministic control policy setting. We utilize ResNet-34 as our actor and critic networks with some slight changes in the fully connected layers. Considering human's mastery of this task and the high-complexity nature of this task, we first use imitation learning to mimic the given human policy and leverage the trained policy and its weights to the reinforcement learning phase for which we use DDPG. This combination shows a considerable performance boost comparing to both pure imitation learning and pure DDPG for the autonomous driving task.
A Dual Memory Structure for Efficient Use of Replay Memory in Deep Reinforcement Learning
Ko, Wonshick, Chang, Dong Eui
Replay memory plays an important role in stable learning and fast convergence of deep reinforcement learning algorithms [1] that are methods of approximating a value or a policy function using deep neural networks [2]. The study of replay memory in reinforcement learning started from [3] and played a major role in training reinforcement learning agents to play Atari 2600 games with a Deep Q-Network (DQN) [4]. In addition, replay memory is used in other off-policy reinforcement learning algorithms such as DDPG [5] and ACER [6]. In [7], after analyzing the importance of the data in the replay memory, a probability distribution is assigned to enable efficient learning through prioritization based on the Figure 1: Proposed dual memory structure.
Learning-Driven Exploration for Reinforcement Learning
Usama, Muhammad, Chang, Dong Eui
Deep reinforcement learning algorithms have been shown to learn complex skills using only high-dimensional observations and scalar reward. Effective and intelligent exploration still remains an unresolved problem for reinforcement learning. Most contemporary reinforcement learning relies on simple heuristic strategies such as $\epsilon$-greedy exploration or adding Gaussian noise to actions. These heuristics, however, are unable to intelligently distinguish the well explored and the unexplored regions of the state space, which can lead to inefficient use of training time. We introduce entropy-based exploration (EBE) that enables an agent to explore efficiently the unexplored regions of the state space. EBE quantifies the agent's learning in a state using merely state dependent action values and adaptively explores the state space, i.e. more exploration for the unexplored region of the state space. We perform experiments on many environments including a simple linear environment, a simpler version of the breakout game and multiple first-person shooter (FPS) games of VizDoom platform. We demonstrate that EBE enables efficient exploration that ultimately results in faster learning without having to tune hyperparameters.
Stochastic Inverse Reinforcement Learning
Ju, Ce, Chang, Dong Eui
Inverse reinforcement learning (IRL) is an ill-posed inverse problem since expert demonstrations may infer many solutions of reward functions which is hard to recover by local search methods such as a gradient method. In this paper, we generalize the original IRL problem to recover a probability distribution for reward functions. We call such a generalized problem stochastic inverse reinforcement learning (SIRL) which is first formulated as an expectation optimization problem. We adopt the Monte Carlo expectation-maximization (MCEM) method, a global search method, to estimate the parameter of the probability distribution as the first solution to SIRL. With our approach, it is possible to observe the deep intrinsic property in IRL from a global viewpoint, and the technique achieves a considerable robust recovery performance on the classic learning environment, objectworld.