Our goal is to pre-train reinforcement learning models on a diverse dataset and then transfer knowledge (either zero-shot or with fine-tuning) to a different test environment. In the last decade, we've seen learning-based systems provide transformative solutions for a wide range of perception and reasoning problems, from recognizing objects in images to recognizing and translating human speech. If fruitful, this line of work could allow learning-based systems to tackle active control tasks, such as robotics and autonomous driving, alongside the passive perception tasks to which they have already been successfully applied. While deep reinforcement learning methods – like Soft Actor Critic– can learn impressive motor skills, they are challenging to train on large and broad data that is not from the target environment. In contrast, the success of deep networks in fields like computer vision was arguably predicated just as much on large datasets, such as ImageNet, as on large neural network architectures.
This weakness could be the reason that many previous works that learn dynamics models of RL environments but don't actually use those models to fully replace the actual environments . Like in the M model proposed in, the dynamics model is a deterministic differentiable model, making the model easily exploitable by the agent if it is not perfect. Using Bayesian models, as in PILCO, helps to address this issue with the uncertainty estimates to some extent, however, they do not fully solve the problem. Recent work combines the model-based approach with traditional model-free RL training by first initializing the policy network with the learned policy, but must subsequently rely on a model-free method to fine-tune this policy in the actual environment.In Learning to Think, it is acceptable that the RNN M isn't always a reliable predictor. A (potentially evolution-based) RNN C can in principle learn to ignore a flawed M, or exploit certain useful parts of M for arbitrary computational purposes including hierarchical planning etc.
All-day and all-weather navigation is a critical capability for autonomous driving, which requires proper reaction to varied environmental conditions and complex agent behaviors. Recently, with the rise of deep learning, end-to-end control for autonomous vehicles has been well studied. However, most works are solely based on visual information, which can be degraded by challenging illumination conditions such as dim light or total darkness. In addition, they usually generate and apply deterministic control commands without considering the uncertainties in the future. In this paper, based on imitation learning, we propose a probabilistic driving model with ultiperception capability utilizing the information from the camera, lidar and radar. We further evaluate its driving performance online on our new driving benchmark, which includes various environmental conditions (e.g., urban and rural areas, traffic densities, weather and times of the day) and dynamic obstacles (e.g., vehicles, pedestrians, motorcyclists and bicyclists). The results suggest that our proposed model outperforms baselines and achieves excellent generalization performance in unseen environments with heavy traffic and extreme weather.
Real world navigation requires robots to operate in unfamiliar, dynamic environments, sharing spaces with humans. Navigating around humans is especially difficult because it requires predicting their future motion, which can be quite challenging. We propose a novel framework for navigation around humans which combines learning-based perception with model-based optimal control. Specifically, we train a Convolutional Neural Network (CNN)-based perception module which maps the robot's visual inputs to a waypoint, or next desired state. This waypoint is then input into planning and control modules which convey the robot safely and efficiently to the goal. To train the CNN we contribute a photo-realistic bench-marking dataset for autonomous robot navigation in the presence of humans. The CNN is trained using supervised learning on images rendered from our photo-realistic dataset. The proposed framework learns to anticipate and react to peoples' motion based only on a monocular RGB image, without explicitly predicting future human motion. Our method generalizes well to unseen buildings and humans in both simulation and real world environments. Furthermore, our experiments demonstrate that combining model-based control and learning leads to better and more data-efficient navigational behaviors as compared to a purely learning based approach. Videos describing our approach and experiments are available on the project website.
Reinforcement learning (RL) agents use trial and error to learn action policies for environment states. Environments with continuous action spaces are far more challenging for RL than those with discrete actions because there are infinite possible continuous action values from which to choose. Dynamic environments create additional challenges for RL agents, which must adjust rapidly to changes. We recently introduced REINFORCE SUN, a superclass of REINFORCE with Gaussian units, that allows for stochasticity at different levels of granularity in artificial neural networks (synapse, unit, or network), and have shown that moving stochasticity to synapses greatly aids RL in both static and dynamic environments with continuous action spaces. However, we also found that performance in dynamic environments remained substantially lower than desired. To rectify this, we here consider alternative parameter update equations for learning in dynamic environments. These equations form the core of Stochastic Synapse Reinforcement Learning (SSRL), which we here generalize to create S*RL, a superclass of SSRL that allows for stochasticity at these levels. Empirical results using multi-dimensional robot inverse kinematic data sets show that S*RL update equations greatly outperform traditional REINFORCE equations in dynamic, continuous state and action spaces.