Reinforcement Learning
Learning from Simulation, Racing in Reality
Chisari, Eugenio, Liniger, Alexander, Rupenyan, Alisa, Van Gool, Luc, Lygeros, John
We present a reinforcement learning-based solution to autonomously race on a miniature race car platform. We show that a policy that is trained purely in simulation using a relatively simple vehicle model, including model randomization, can be successfully transferred to the real robotic setup. We achieve this by using novel policy output regularization approach and a lifted action space which enables smooth actions but still aggressive race car driving. We show that this regularized policy does outperform the Soft Actor Critic (SAC) baseline method, both in simulation and on the real car, but it is still outperformed by a Model Predictive Controller (MPC) state of the art method. The refinement of the policy with three hours of real-world interaction data allows the reinforcement learning policy to achieve lap times similar to the MPC controller while reducing track constraint violations by 50%.
Adaptable Automation with Modular Deep Reinforcement Learning and Policy Transfer
Raziei, Zohreh, Moghaddam, Mohsen
The need for "intelligence" in such automation systems stems from the fact that most robotic operations in industry are currently limited to rote and repetitive tasks performed within structured environments. This leaves an entire swath of more complex tasks with high degrees of uncertainty and dynamic environments [7] difficult or even impossible to automate. Examples include maintenance and material handling for producing the desired product in manufacturing systems [8], robot surgeries and pharmacy automation in healthcare systems [9], safe working environments in disaster management for deep-sea operation, and nuclear energy [10], fruit picking, crop sensing, and selective weeding in agriculture systems [11]. A fundamental question concerning the notion of intelligent automation in this context then becomes: How can we enable adaptable industrial automation systems that can analyze and act upon their perceived environment rather than merely executing a set of predefined programs? Adaptability is among the key characteristics of industrial automation systems in response to unpredictable changes or disruptions in the process [12].
Episodic Self-Imitation Learning with Hindsight
Dai, Tianhong, Liu, Hengyan, Bharath, Anil Anthony
Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning. Compared to the original self-imitation learning algorithm, which samples good state-action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation learning. A selection module is introduced to filter uninformative samples from each episode of the update. The proposed method overcomes the limitations of the standard self-imitation learning algorithm, a transitions-based method which performs poorly in handling continuous control environments with sparse rewards. From the experiments, episodic self-imitation learning is shown to perform better than baseline on-policy algorithms, achieving comparable performance to state-of-the-art off-policy algorithms in several simulated robot control tasks. The trajectory selection module is shown to prevent the agent learning undesirable hindsight experiences. With the capability of solving sparse reward problems in continuous control settings, episodic self-imitation learning has the potential to be applied to real-world problems that have continuous action spaces, such as robot guidance and manipulation.
Path Design and Resource Management for NOMA enhanced Indoor Intelligent Robots
Zhong, Ruikang, Liu, Xiao, Liu, Yuanwei, Chen, Yue, Wang, Xianbin
A communication enabled indoor intelligent robots (IRs) service framework is proposed, where nonorthogonal multiple access (NOMA) technique is adopted to enable highly reliable communications. In cooperation with the ultramodern indoor channel model recently proposed by the International Telecommunication Union (ITU), the Lego modeling method is proposed, which can deterministically describe the indoor layout and channel state in order to construct the radio map. The investigated radio map is invoked as a virtual environment to train the reinforcement learning agent, which can save training time and hardware costs. Build on the proposed communication model, motions of IRs who need to reach designated mission destinations and their corresponding down-link power allocation policy are jointly optimized to maximize the mission efficiency and communication reliability of IRs. In an effort to solve this optimization problem, a novel reinforcement learning approach named deep transfer deterministic policy gradient (DT-DPG) algorithm is proposed. Our simulation results demonstrate that 1) With the aid of NOMA techniques, the communication reliability of IRs is effectively improved; 2) The radio map is qualified to be a virtual training environment, and its statistical channel state information improves training efficiency by about 30%; 3) The proposed DT-DPG algorithm is superior to the conventional deep deterministic policy gradient (DDPG) algorithm in terms of optimization performance, training time, and anti-local optimum ability. Xianbin Wang is with Department of Electrical and Computer Engineering, Western University, London, ON N6A5B9, Canada (email: xianbin.wang@uwo.ca). The explosive development of robotics and artificial intelligence technologies have changed, are changing and will continue to transform human lives. In recent years, intelligent robots (IRs) are proven competent to provide a variety of services, such as security monitoring, sanitation, and travel guides [1]. New various services offered by IRs require a large amount of communication, computation and data resources, which are not necessarily provided locally [2].
Symmetry-Aware Actor-Critic for 3D Molecular Design
Simm, Gregor N. C., Pinsler, Robert, Csányi, Gábor, Hernández-Lobato, José Miguel
Automating molecular design using deep reinforcement learning (RL) has the potential to greatly accelerate the search for novel materials. Despite recent progress on leveraging graph representations to design molecules, such methods are fundamentally limited by the lack of three-dimensional (3D) information. In light of this, we propose a novel actor-critic architecture for 3D molecular design that can generate molecular structures unattainable with previous approaches. This is achieved by exploiting the symmetries of the design process through a rotationally covariant state-action representation based on a spherical harmonics series expansion. We demonstrate the benefits of our approach on several 3D molecular design tasks, where we find that building in such symmetries significantly improves generalization and the quality of generated molecules.
Predictive PER: Balancing Priority and Diversity towards Stable Deep Reinforcement Learning
Lee, Sanghwa, Lee, Jaeyoung, Hasuo, Ichiro
Prioritized experience replay (PER) samples important transitions, rather than uniformly, to improve the performance of a deep reinforcement learning agent. We claim that such prioritization has to be balanced with sample diversity for making the DQN stabilized and preventing forgetting. Our proposed improvement over PER, called Predictive PER (PPER), takes three countermeasures (TDInit, TDClip, TDPred) to (i) eliminate priority outliers and explosions and (ii) improve the sample diversity and distributions, weighted by priorities, both leading to stabilizing the DQN. The most notable among the three is the introduction of the second DNN called TDPred to generalize the in-distribution priorities. Ablation study and full experiments with Atari games show that each countermeasure by its own way and PPER contribute to successfully enhancing stability and thus performance over PER.
Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation
Cordier, Thibault, Urvoy, Tanguy, Rojas-Barahona, Lina M., Lefèvre, Fabrice
These interactions can be taken from either human-to-human or human-machine conversations. However, human interactions are scarce and costly, making learning from few interactions essential. One solution to speedup the learning process is to guide the agent's exploration with the help of an expert. We present in this paper several imitation learning strategies for dialogue policy where the guiding expert is a near-optimal handcrafted policy. We incorporate these strategies with state-of-the-art reinforcement learning methods based on Q-learning and actorcritic. We notably propose a randomised exploration policy which allows for a seamless hybridisation of the learned policy and the expert, which can be seen as a dilution of the expert's demonstration into the resulting policy. Our experiments show that our hybridisation strategy outperforms several baselines, and that it could accelerate the learning when facing real humans.
An End-to-end Deep Reinforcement Learning Approach for the Long-term Short-term Planning on the Frenet Space
Moghadam, Majid, Alizadeh, Ali, Tekin, Engin, Elkaim, Gabriel Hugh
Tactical decision making and strategic motion planning for autonomous highway driving are challenging due to the complication of predicting other road users' behaviors, diversity of environments, and complexity of the traffic interactions. This paper presents a novel end-to-end continuous deep reinforcement learning approach towards autonomous cars' decision-making and motion planning. For the first time, we define both states and action spaces on the Frenet space to make the driving behavior less variant to the road curvatures than the surrounding actors' dynamics and traffic interactions. The agent receives time-series data of past trajectories of the surrounding vehicles and applies convolutional neural networks along the time channels to extract features in the backbone. The algorithm generates continuous spatiotemporal trajectories on the Frenet frame for the feedback controller to track. Extensive high-fidelity highway simulations on CARLA show the superiority of the presented approach compared with commonly used baselines and discrete reinforcement learning on various traffic scenarios. Furthermore, the proposed method's advantage is confirmed with a more comprehensive performance evaluation against 1000 randomly generated test scenarios.
Unsupervised Object Keypoint Learning using Local Spatial Predictability
Gopalakrishnan, Anand, van Steenkiste, Sjoerd, Schmidhuber, Jürgen
Hence, which layer(s) we choose as our feature embedding will have an effect on the outcome of the local spatial prediction problem. While more abstract high-level features are expected to better capture the internal predictive structure of an object, it will be more difficult to attribute the error of the prediction network to the exact image location. On the other hand, while more low-level features can be localized more accurately, they may lack the expressiveness to capture high-level properties of objects. Nonetheless, in practice we find that a spatial feature embedding based on earlier layers of the encoder works well (see also Section 5.3 for an ablation). Local Spatial Prediction Task Using the learned spatial feature embedding we seek out salient regions of the input image that correspond to object parts. Our approach is based on the idea that objects correspond to local regions in feature space that have high internal predictive structure, which allows us to formulate the following local spatial prediction (LSP) task. For each location in the learned spatial feature embedding, we seek to predict the value of the features (across the feature maps) from its neighbouring feature values. When neighbouring areas correspond to the same object-(part), i.e. they regularly appear together, we expect that this prediction problem is easy (green arrow in Figure 3).
Sensorimotor representation learning for an "active self" in robots: A model survey
Nguyen, Phuong D. H., Georgie, Yasmin Kim, Kayhan, Ezgi, Eppe, Manfred, Hafner, Verena Vanessa, Wermter, Stefan
For example, sensorimotor birth, infants spend their first months of life undergoing experiences are used to learn a forward model, and a many developmental milestones to incrementally develop forward model can be the basis for learning high-level the representation of their body. This body schema is cognitive conceptual representations. In agreement with related mainly to touch, proprioception, and vision (see Schillaci et al. (2016), we aim to go deeper into the role of Table 1) as these sensory modalities continue to develop multisensory information collected through exploration from the fetal stage (see Hoffmann, 2017; Adolph in the formation of an agent's body and peripersonal and Joh, 2007 for reviews). Later on, the representation space representation, and how these sensorimotor representations of the surrounding space of the body--the PPS--is affect the agent's sense of the active self, aggregated from the proprioceptive and exteroceptive including the sense of agency and the sense of body modalities (see Table 1). In addition, infants develop ownership. Thus, motor explorations will be mentioned the capability to generate motor actions corresponding but not exhaustively discussed in this surveyed work.