Moll, Mark
How Much Do Unstated Problem Constraints Limit Deep Robotic Reinforcement Learning?
Lewis, W. Cannon II, Moll, Mark, Kavraki, Lydia E.
Deep Reinforcement Learning is a promising paradigm for robotic control which has been shown to be capable of learning policies for high-dimensional, continuous control of unmodeled systems. However, RoboticReinforcement Learning currently lacks clearly defined benchmark tasks, which makes it difficult for researchers to reproduce and compare against prior work. ``Reacher'' tasks, which are fundamental to robotic manipulation, are commonly used as benchmarks, but the lack of a formal specification elides details that are crucial to replication. In this paper we present a novel empirical analysis which shows that the unstated spatial constraints in commonly used implementations of Reacher tasks make it dramatically easier to learn a successful control policy with DeepDeterministic Policy Gradients (DDPG), a state-of-the-art Deep RL algorithm. Our analysis suggests that less constrained Reacher tasks are significantly more difficult to learn, and hence that existing de facto benchmarks are not representative of the difficulty of general robotic manipulation.
Optimal and Efficient Stochastic Motion Planning in Partially-Known Environments
Luna, Ryan J (Rice University) | Lahijanian, Morteza (Rice University) | Moll, Mark (Rice University) | Kavraki, Lydia E (Rice University)
A framework capable of computing optimal control policies for a continuous system in the presence of both action and environment uncertainty is presented in this work. The framework decomposes the planning problem into two stages: an offline phase that reasons only over action uncertainty and an online phase that quickly reacts to the uncertain environment. Offline, a bounded-parameter Markov decision process (BMDP) is employed to model the evolution of the stochastic system over a discretization of the environment. Online, an optimal control policy over the BMDP is computed. Upon the discovery of an unknown environment feature during policy execution, the BMDP is updated and the optimal control policy is efficiently recomputed. Depending on the desired quality of the control policy, a suite of methods is presented to incorporate new information into the BMDP with varying degrees of detail online. Experiments confirm that the framework recomputes high-quality policies in seconds and is orders of magnitude faster than existing methods.
Combining a POMDP Abstraction with Replanning to Solve Complex, Position-Dependent Sensing Tasks
Grady, Devin (Rice University) | Moll, Mark (Rice University) | Kavraki, Lydia E. (Rice University)
The Partially-Observable Markov Decision Process (POMDP) is a general framework to determine reward-maximizing action policies under noisy action and sensing conditions. However, determining an optimal policy for POMDPs is often intractable for robotic tasks due to the PSPACE-complete nature of the computation required. Several recent solvers have been introduced that expand the size of problems that can be considered. Although these POMDP solvers can respect complex motion constraints in theory, we show that the computational cost does not provide a benefit in the eventual online execution, compared to our alternative approach that relies on a policy that ignores some of the motion constraints. We advocate using the POMDP framework where it is critical -- to find a policy that provides the optimal action given all past noisy sensor observations, while abstracting some of the motion constraints to reduce solution time. However, the actions of an abstract robot are generally not executable under its true motion constraints. The problem is addressed offline with a less-constrained POMDP, and navigation under the full system constraints is handled online with replanning. We empirically demonstrate that the policy generated using this abstracted motion model is faster to compute and achieves similar or higher reward than addressing the motion constraints for a car-like robot as used in our experiments directly in the POMDP.