Suresh, Aamodh
Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning
Suttle, Wesley A., Suresh, Aamodh, Nieto-Granda, Carlos
Entropy-based objectives are widely used to perform state space exploration in reinforcement learning (RL) and dataset generation for offline RL. Behavioral entropy (BE), a rigorous generalization of classical entropies that incorporates cognitive and perceptual biases of agents, was recently proposed for discrete settings and shown to be a promising metric for robotic exploration problems. In this work, we propose using BE as a principled exploration objective for systematically generating datasets that provide diverse state space coverage in complex, continuous, potentially high-dimensional domains. To achieve this, we extend the notion of BE to continuous settings, derive tractable k-nearest neighbor estimators, provide theoretical guarantees for these estimators, and develop practical reward functions that can be used with standard RL methods to learn BE-maximizing policies. Using standard MuJoCo environments, we experimentally compare the performance of offline RL algorithms for a variety of downstream tasks on datasets generated using BE, Rรฉnyi, and Shannon entropy-maximizing policies, as well as the SMM and RND algorithms. We find that offline RL algorithms trained on datasets collected using BE outperform those trained on datasets collected using Shannon entropy, SMM, and RND on all tasks considered, and on 80% of the tasks compared to datasets collected using Rรฉnyi entropy. Reinforcement learning (RL) methods can successfully solve challenging tasks in complex environments, even outperforming humans in a variety of cases (Mnih et al., 2015; Silver et al., 2018).
Robotic Exploration using Generalized Behavioral Entropy
Suresh, Aamodh, Nieto-Granda, Carlos, Martinez, Sonia
This work presents and evaluates a novel strategy for robotic exploration that leverages human models of uncertainty perception. To do this, we introduce a measure of uncertainty that we term ``Behavioral entropy'', which builds on Prelec's probability weighting from Behavioral Economics. We show that the new operator is an admissible generalized entropy, analyze its theoretical properties and compare it with other common formulations such as Shannon's and Renyi's. In particular, we discuss how the new formulation is more expressive in the sense of measures of sensitivity and perceptiveness to uncertainty introduced here. Then we use Behavioral entropy to define a new type of utility function that can guide a frontier-based environment exploration process. The approach's benefits are illustrated and compared in a Proof-of-Concept and ROS-unity simulation environment with a Clearpath Warthog robot. We show that the robot equipped with Behavioral entropy explores faster than Shannon and Renyi entropies.
Robot Navigation in Risky, Crowded Environments: Understanding Human Preferences
Suresh, Aamodh, Taylor, Angelique, Riek, Laurel D., Martinez, Sonia
Risky and crowded environments (RCE) contain abstract sources of risk and uncertainty, which are perceived differently by humans, leading to a variety of behaviors. Thus, robots deployed in RCEs, need to exhibit diverse perception and planning capabilities in order to interpret other human agents' behavior and act accordingly in such environments. To understand this problem domain, we conducted a study to explore human path choices in RCEs, enabling better robotic navigational explainable AI (XAI) designs. We created a novel COVID-19 pandemic grocery shopping scenario which had time-risk tradeoffs, and acquired users' path preferences. We found that participants showcase a variety of path preferences: from risky and urgent to safe and relaxed. To model users' decision making, we evaluated three popular risk models (Cumulative Prospect Theory (CPT), Conditional Value at Risk (CVAR), and Expected Risk (ER). We found that CPT captured people's decision making more accurately than CVaR and ER, corroborating theoretical results that CPT is more expressive and inclusive than CVaR and ER. We also found that people's self assessments of risk and time-urgency do not correlate with their path preferences in RCEs. Finally, we conducted thematic analysis of open-ended questions, providing crucial design insights for robots is RCE. Thus, through this study, we provide novel and critical insights about human behavior and perception to help design better navigational explainable AI (XAI) in RCEs.
Planning under risk and uncertainty based on Prospect-theoretic models
Suresh, Aamodh, Martinez, Sonia
In this work, we develop a novel sampling-based motion planing approach to generate plans in a risky and uncertain environment. To model a variety of risk-sensitivity profiles, we propose an adaption of Cumulative Prospect Theory (CPT) to the setting of path planning. This leads to the definition of a non-rational continuous cost envelope (as well as a continuous uncertainty envelope) associated with an obstacle environment. We use these metrics along with standard costs like path length to formulate path planning problems. Building on RRT*, we then develop a sampling-based motion planner that generates desirable paths from the perspective of a given risk sensitive profile. Since risk sensitivity can greatly vary, we provide a tuning knob to appease a diversity of decision makers (DM), ranging from totally risk-averse to risk-indifferent. Additionally, we adapt a Simultaneous Perturbation Stochastic Approximation (SPSA)-based algorithm to learn the CPT parameters that can best represent a certain DM. Simulations are presented in a 2D environment to evaluate the modeling approach and algorithm's performance.