Combining model-based and model-free learning systems has been shown to improve the sample efficiency of learning to perform complex robotic tasks. However, dual-system approaches fail to consider the reliability of the learned model when it is applied to make multiple-step predictions, resulting in a compounding of prediction errors and performance degradation. In this paper, we present a novel dual-system motor learning approach where a meta-controller arbitrates online between model-based and model-free decisions based on an estimate of the local reliability of the learned model. The reliability estimate is used in computing an intrinsic feedback signal, encouraging actions that lead to data that improves the model. Our approach also integrates arbitration with imagination where a learned latent-space model generates imagined experiences, based on its local reliability, to be used as additional training data. We evaluate our approach against baseline and state-of-the-art methods on learning vision-based robotic grasping in simulation and real world. The results show that our approach outperforms the compared methods and learns near-optimal grasping policies in dense- and sparse-reward environments.
Recent success in deep reinforcement learning for continuous control has been dominated by model-free approaches which, unlike model-based approaches, do not suffer from representational limitations in making assumptions about the world dynamics and model errors inevitable in complex domains. However, they require a lot of experiences compared to model-based approaches that are typically more sample-efficient. We propose to combine the benefits of the two approaches by presenting an integrated approach called Curious Meta-Controller. Our approach alternates adaptively between model-based and model-free control using a curiosity feedback based on the learning progress of a neural model of the dynamics in a learned latent space. We demonstrate that our approach can significantly improve the sample efficiency and achieve near-optimal performance on learning robotic reaching and grasping tasks from raw-pixel input in both dense and sparse reward settings.
This papers aims to examine the potential of using the emerging deep reinforcement learning techniques in flight control. Instead of learning from scratch, the autopilot structure is fixed as typical three-loop autopilot and deep reinforcement learning is utilised to learn the autopilot gains. This domain-knowledge-aided approach is proved to significantly improve the learning efficiency. To solve the flight control problem, we then formulate a Markovian decision process with a proper reward function that enable the application of reinforcement learning theory. The state-of-the-art deep deterministic policy gradient algorithm is utilised to learn an action policy that maps the observed states to the autopilot gains. Extensive empirical numerical simulations are performed to validate the proposed computational control algorithm.
The control problem of the flexible wing aircraft is challenging due to the prevailing and high nonlinear deformations in the flexible wing system. This urged for new control mechanisms that are robust to the real-time variations in the wing's aerodynamics. An online control mechanism based on a value iteration reinforcement learning process is developed for flexible wing aerial structures. It employs a model-free control policy framework and a guaranteed convergent adaptive learning architecture to solve the system's Bellman optimality equation. A Riccati equation is derived and shown to be equivalent to solving the underlying Bellman equation. The online reinforcement learning solution is implemented using means of an adaptive-critic mechanism. The controller is proven to be asymptotically stable in the Lyapunov sense. It is assessed through computer simulations and its superior performance is demonstrated on two scenarios under different operating conditions.
Classical methods to control heating systems are often marred by suboptimal performance, inability to adapt to dynamic conditions and unreasonable assumptions e.g. existence of building models. This paper presents a novel deep reinforcement learning algorithm which can control space heating in buildings in a computationally efficient manner, and benchmarks it against other known techniques. The proposed algorithm outperforms rule based control by between 5-10% in a simulation environment for a number of price signals. We conclude that, while not optimal, the proposed algorithm offers additional practical advantages such as faster computation times and increased robustness to non-stationarities in building dynamics.