Our goal is to pre-train reinforcement learning models on a diverse dataset and then transfer knowledge (either zero-shot or with fine-tuning) to a different test environment. In the last decade, we've seen learning-based systems provide transformative solutions for a wide range of perception and reasoning problems, from recognizing objects in images to recognizing and translating human speech. If fruitful, this line of work could allow learning-based systems to tackle active control tasks, such as robotics and autonomous driving, alongside the passive perception tasks to which they have already been successfully applied. While deep reinforcement learning methods – like Soft Actor Critic– can learn impressive motor skills, they are challenging to train on large and broad data that is not from the target environment. In contrast, the success of deep networks in fields like computer vision was arguably predicated just as much on large datasets, such as ImageNet, as on large neural network architectures.
This paper explores how an autonomous agent can model dynamic environments and use that knowledge to improve its behavior. This capability is of particular importance for persistent agents, or long-term autonomy. Inspiration is drawn from circadian rhythms in nature, which drive periodic behavior in many organisms. In our approach, the chemical oscillators from nature are replaced with methods from time series analysis designed for forecasting complex season patterns. This model is incorporated into a behavior-based architecture as an advanced-percept, providing future estimates of the environment rather than current measurements. A simulated application of a janitor robot working in an environment with heavy pedestrian traffic was created as a testbed. Experimental data used real world pedestrian traffic counts and showed an agent using online forecasting of future traffic outperformed both a reactive, sensor-based, strategy and a strategy with a deterministic schedule.
Localization, that is the estimation of a robot's location from sensor data, is a fundamental problem in mobile robotics. This papers presents a version of Markov localization which provides accurate position estimates and which is tailored towards dynamic environments. The key idea of Markov localization is to maintain a probability density over the space of all locations of a robot in its environment. Our approach represents this space metrically, using a fine-grained grid to approximate densities. It is able to globally localize the robot from scratch and to recover from localization failures. It is robust to approximate models of the environment (such as occupancy grid maps) and noisy sensors (such as ultrasound sensors). Our approach also includes a filtering technique which allows a mobile robot to reliably estimate its position even in densely populated environments in which crowds of people block the robot's sensors for extended periods of time. The method described here has been implemented and tested in several real-world applications of mobile robots, including the deployments of two mobile robots as interactive museum tour-guides.
We present a novel optimization-based algorithm for motion planning in dynamic environments. Our approach uses a stochastic trajectory optimization framework to avoid collisions and satisfy smoothness and dynamics constraints. Our algorithm does not require a priori knowledge about global motion or trajectories of dynamic obstacles. Rather, we compute a conservative local bound on the position or trajectory of each obstacle over a short time and use the bound to compute a collision-free trajectory for the robot in an incremental manner. Moreover, we interleave planning and execution of the robot in an adaptive manner to balance between the planning horizon and responsiveness to obstacle. We highlight the performance of our planner in a simulated dynamic environment with the 7-DOF PR2 robot arm and dynamic obstacles.
We present a framework that enables online planning for robotic systems in dynamic environments. The PLANrm framework presented in this work utilizes the theory of robustness and monitoring of Metric Temporal Logic (MTL) specifications to inspect and modify available plans to both avoid obstacles and satisfy specifications in a dynamic environment. The use of MTL allows the practitioner to set complex event and timing based specifications that need to be satisfied in the execution of the plan. The monitoring algorithm inspects the possible paths in a bounded window and selects and adjusts a path to satisfy the specifications. In this paper, we present initial results on the framework and an extended summary of the algorithmic results. The approach is illustrated using a running example of a car-like model with a number of MTL specifications.