Tokekar, Pratap
ProxMaP: Proximal Occupancy Map Prediction for Efficient Indoor Robot Navigation
Sharma, Vishnu Dutt, Chen, Jingxi, Tokekar, Pratap
In a typical path planning pipeline for a ground robot, we build a map (e.g., an occupancy grid) of the environment as the robot moves around. While navigating indoors, a ground robot's knowledge about the environment may be limited due to occlusions. Therefore, the map will have many as-yet-unknown regions that may need to be avoided by a conservative planner. Instead, if a robot is able to correctly predict what its surroundings and occluded regions look like, the robot may be more efficient in navigation. In this work, we focus on predicting occupancy within the reachable distance of the robot to enable faster navigation and present a self-supervised proximity occupancy map prediction method, named ProxMaP. We show that ProxMaP generalizes well across realistic and real domains, and improves the robot navigation efficiency in simulation by \textbf{$12.40\%$} against the traditional navigation method. We share our findings on our project webpage (see https://raaslab.org/projects/ProxMaP ).
Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning
Chakraborty, Souradip, Bedi, Amrit Singh, Koppel, Alec, Sadler, Brian M., Huang, Furong, Tokekar, Pratap, Manocha, Dinesh
Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assumptions on the target transition model to belong to a generic family of mixture models; (ii) is applicable to large-scale training by incorporating a compression step such that the posterior estimate consists of a Bayesian coreset of only statistically significant past state-action pairs; and (iii) exhibits a sublinear Bayesian regret. To achieve these results, we adopt an approach based upon Stein's method, which, under a smoothness condition on the constructed posterior and target, allows distributional distance to be evaluated in closed form as the kernelized Stein discrepancy (KSD). The aforementioned compression step is then computed in terms of greedily retaining only those samples which are more than a certain KSD away from the previous model estimate. Experimentally, we observe that this approach is competitive with several state-of-the-art RL methodologies, and can achieve up-to 50 percent reduction in wall clock time in some continuous control environments.
On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces
Bedi, Amrit Singh, Chakraborty, Souradip, Parayil, Anjaly, Sadler, Brian, Tokekar, Pratap, Koppel, Alec
We focus on parameterized policy search for reinforcement learning over continuous action spaces. Typically, one assumes the score function associated with a policy is bounded, which fails to hold even for Gaussian policies. To properly address this issue, one must introduce an exploration tolerance parameter to quantify the region in which it is bounded. Doing so incurs a persistent bias that appears in the attenuation rate of the expected policy gradient norm, which is inversely proportional to the radius of the action space. To mitigate this hidden bias, heavy-tailed policy parameterizations may be used, which exhibit a bounded score function, but doing so can cause instability in algorithmic updates. To address these issues, in this work, we study the convergence of policy gradient algorithms under heavy-tailed parameterizations, which we propose to stabilize with a combination of mirror ascent-type updates and gradient tracking. Our main theoretical contribution is the establishment that this scheme converges with constant step and batch sizes, whereas prior works require these parameters to respectively shrink to null or grow to infinity. Experimentally, this scheme under a heavy-tailed policy parameterization yields improved reward accumulation across a variety of settings as compared with standard benchmarks.
Crop Height and Plot Estimation for Phenotyping from Unmanned Aerial Vehicles using 3D LiDAR
Dhami, Harnaik, Yu, Kevin, Xu, Tianshu, Zhu, Qian, Dhakal, Kshitiz, Friel, James, Li, Song, Tokekar, Pratap
We present techniques to measure crop heights using a 3D Light Detection and Ranging (LiDAR) sensor mounted on an Unmanned Aerial Vehicle (UAV). Knowing the height of plants is crucial to monitor their overall health and growth cycles, especially for high-throughput plant phenotyping. We present a methodology for extracting plant heights from 3D LiDAR point clouds, specifically focusing on plot-based phenotyping environments. We also present a toolchain that can be used to create phenotyping farms for use in Gazebo simulations. The tool creates a randomized farm with realistic 3D plant and terrain models. We conducted a series of simulations and hardware experiments in controlled and natural settings. Our algorithm was able to estimate the plant heights in a field with 112 plots with a root mean square error (RMSE) of 6.1 cm. This is the first such dataset for 3D LiDAR from an airborne robot over a wheat field. The developed simulation toolchain, algorithmic implementation, and datasets can be found on the GitHub repository located at https://github.com/hsd1121/PointCloudProcessing.
Multi-Agent Reinforcement Learning for Persistent Monitoring
Chen, Jingxi, Baskaran, Amrish, Zhang, Zhongshun, Tokekar, Pratap
The Persistent Monitoring (PM) problem seeks to find a set of trajectories (or controllers) for robots to persistently monitor a changing environment. Each robot has a limited field-of-view and may need to coordinate with others to ensure no point in the environment is left unmonitored for long periods of time. We model the problem such that there is a penalty that accrues every time step if a point is left unmonitored. However, the dynamics of the penalty are unknown to us. We present a Multi-Agent Reinforcement Learning (MARL) algorithm for the persistent monitoring problem. Specifically, we present a Multi-Agent Graph Attention Proximal Policy Optimization (MA-G-PPO) algorithm that takes as input the local observations of all agents combined with a low resolution global map to learn a policy for each agent. The graph attention allows agents to share their information with others leading to an effective joint policy. Our main focus is to understand how effective MARL is for the PM problem. We investigate five research questions with this broader goal. We find that MA-G-PPO is able to learn a better policy than the non-RL baseline in most cases, the effectiveness depends on agents sharing information with each other, and the policy learnt shows emergent behavior for the agents.
Recreating Bat Behavior on Quad-Rotor UAVsโA Simulation Approach
Tanveer, M. Hassan (Virginia Polytechnic Institute and State University ) | Thomas, Antony (University of Genoa) | Wu, Xiaowei (Virginia Polytechnic Institute and State University) | Mรผller, Rolf (Virginia Polytechnic Institute and State University) | Tokekar, Pratap (University of Maryland) | Zhu, Hongxiao (Virginia Polytechnic Institute and State University)
We develop an effective computer model to simulate sensing environments that consist of natural trees. The simulated environments are random and contain full geometry of the tree foliage. While this simulated model can be used as a general platform for studying the sensing mechanism of different flying species, our ultimate goal is to build bat-inspired Quad-rotor UAVsโ UAVs that can recreate batโs flying behavior (e.g., obstacle avoidance, path planning) in dense vegetation. To this end, we also introduce a foliage echo simulator that can produce simulated echoes by mimicking batโs biosonar. In our current model, a few realistic model choices or assumptions are made. First, in order to create natural looking trees, the branching structures of trees are modeled by L-systems, whereas the detailed geometry of branches, sub-branches and leaves is created by randomizing a reference tree in a CAD object file. Additionally, the foliage echo simulator is simplified so that no shading effect is considered. We demonstrate our developed model by simulating real-world scenarios with multiple trees and compute the corresponding impulse responses along a Quad-rotor trajectory.
Distributed Attack-Robust Submodular Maximization for Multi-Robot Planning
Zhou, Lifeng, Tzoumas, Vasileios, Pappas, George J., Tokekar, Pratap
We aim to guard swarm-robotics applications against denial-of-service (DoS) failures/attacks that result in withdrawals of robots. We focus on applications requiring the selection of actions for each robot, among a set of available ones, e.g., which trajectory to follow. Such applications are central in large-scale robotic/control applications, e.g., multi-robot motion planning for target tracking. But the current attack-robust algorithms are centralized, and scale quadratically with the problem size (e.g., number of robots). Thus, in this paper, we propose a general-purpose distributed algorithm towards robust optimization at scale, with local communications only. We name it distributed robust maximization (DRM). DRM proposes a divide-and-conquer approach that distributively partitions the problem among K cliques of robots. The cliques optimize in parallel, independently of each other. That way, DRM also offers significant computational speed-ups up to 1/K^2 the running time of its centralized counterparts. K depends on the robots' communication range, which is given as input to DRM. DRM also achieves a close-to-optimal performance, equal to the guaranteed performance of its centralized counterparts. We demonstrate DRM's performance in both Gazebo and MATLAB simulations, in scenarios of active target tracking with swarms of robots. We observe DRM achieves significant computational speed-ups (it is 3 to 4 orders faster) and, yet, nearly matches the tracking performance of its centralized counterparts.
Risk-Aware Planning by Confidence Estimation using Deep Learning-Based Perception
Toubeh, Maymoonah, Tokekar, Pratap
This work proposes the use of Bayesian approximations of uncertainty from deep learning in a robot planner, showing that this produces more cautious actions in safety-critical scenarios. The case study investigated is motivated by a setup where an aerial robot acts as a "scout" for a ground robot. This is useful when the below area is unknown or dangerous, with applications in space exploration, military, or search-and-rescue. Images taken from the aerial view are used to provide a less obstructed map to guide the navigation of the robot on the ground. Experiments are conducted using a deep learning semantic image segmentation, followed by a path planner based on the resulting cost map, to provide an empirical analysis of the proposed method. A comparison with similar approaches is presented to portray the usefulness of certain techniques, or variations within a technique, in similar experimental settings. The method is analyzed to assess the impact of variations in the uncertainty extraction, as well as the absence of an uncertainty metric, on the overall system with the use of a defined metric which measures surprise to the planner. The analysis is performed on multiple datasets, showing a similar trend of lower surprise when uncertainty information is incorporated in the planning, given threshold values of the hyperparameters in the uncertainty extraction have been met. We find that taking uncertainty into account leads to paths that could be 18% less risky on an average.
An Approximation Algorithm for Risk-averse Submodular Optimization
Zhou, Lifeng, Tokekar, Pratap
We study the problem of incorporating risk while making combinatorial decisions under uncertainty. We formulate a discrete submodular maximization problem for selecting a set using Conditional-Value-at-Risk (CVaR), a risk metric commonly used in financial analysis. While CVaR has recently been used in optimization of linear costs functions in robotics, we take the first stages towards extending this to discrete submodular optimization and provide several positive results. Specifically, we propose the Sequential Greedy Algorithm that provides an approximation guarantee on finding the maxima of the CVaR cost function under a matroidal constraint. The approximation guarantee shows that the solution produced by our algorithm is within a constant factor of the optimal and an additive term that depends on the optimal. Our analysis uses the curvature of the submodular set function, and proves that the algorithm runs in polynomial time. This formulates a number of combinatorial optimization problems that appear in robotics. We use two such problems, vehicle assignment under uncertainty for mobility-on-demand and sensor selection with failures for environmental monitoring, as case studies to demonstrate the efficacy of our formulation.