Brown University
Learning Abstractions by Transferring Abstract Policies to Grounded State Spaces
Wong, Lawson L. S. (Brown University)
Learning from demonstration is an effective paradigm to teach specific tasks to robots. However, such demonstrations often have to be performed on the robot, which is both time-consuming and often still requires expert knowledge (e.g., kinesthetically controlling the joints). It is often easier to specify tasks at a high level of abstraction, and let the robot figure out the grounding to the robot/agent space. We consider how to learn such a mapping. In particular, we consider the task of learning to navigate on a mobile robot given only an abstraction of the path and potential landmarks. We cast this as a learning problem between abstract and robot (grounded) state spaces and illustrate how this works in several cases. Through these cases, we see that the ``abstract navigation'' task touches on many interesting issues related to abstraction, and suggest avenues for further investigation.
Planning Hierarchies and their Connections to Language
Gopalan, Nakul (Brown University)
Robots working with humans in real environments need to plan in a large state--action space given a natural language command. Such a problem poses multiple challenges with respect to the size of the state--action space to plan over, the different modalities that natural language can provide to specify the goal condition, and the difficulty of learning a model of such an environment to plan over. In this thesis we would look at using hierarchical methods to learn and plan in these large state--action spaces. Further, we would look the using natural language to guide the construction and learning of hierarchies and reward functions.
Bandit-Based Solar Panel Control
Abel, David (Brown University) | Williams, Edward C. (Brown University) | Brawner, Stephen (Brown University) | Reif, Emily (Brown University) | Littman, Michael L. (Brown University)
Solar panels sustainably harvest energy from the sun. To improve performance, panels are often equipped with a tracking mechanism that computes the sunโs position in the sky throughout the day. Based on the trackerโs estimate of the sunโs location, a controller orients the panel to minimize the angle of incidence between solar radiant energy and the photovoltaic cells on the surface of the panel, increasing total energy harvested. Prior work has developed efficient tracking algorithms that accurately compute the sunโs location to facilitate solar tracking and control. However, always pointing a panel directly at the sun does not account for diffuse irradiance in the sky, reflected irradiance from the ground and surrounding surfaces, power required to reorient the panel, shading effects from neighboring panels and foliage, or changing weather conditions (such as clouds), all of which are contributing factors to the total energy harvested by a fleet of solar panels. In this work, we show that a bandit-based approach can increase the total energy harvested by solar panels by learning to dynamically account for such other factors. Our contribution is threefold: (1) the development of a test bed based on typical solar and irradiance models for experimenting with solar panel control using a variety of learning methods, (2) simulated validation that bandit algorithms can effectively learn to control solar panels, and (3) the design and construction of an intelligent solar panel prototype that learns to angle itself using bandit algorithms.
How People Explain Action (and Autonomous Intelligent Systems Should Too)
Graaf, Maartje M. A. de (Brown University) | Malle, Bertram F. (Brown University)
To make Autonomous Intelligent Systems (AIS), such as virtual agents and embodied robots, โexplainableโ we need to understand how people respond to such systems and what expectations they have of them. Our thesis is that people will regard most AIS as intentional agents and apply the conceptual framework and psychological mechanisms of human behavior explanation to them. We present a well-supported theory of how people explain human behavior and sketch what it would take to implement the underlying framework of explanation in AIS. The benefits will be considerable: When an AIS is able to explain its behavior in ways that people find comprehensible, people are more likely to form correct mental models of such a system and calibrate their trust in the system.
Ask Me Anything about MOOCs
Fisher, Doug (Vanderbilt University.) | Isbell, Charles (Georgia Institute of Technology) | Littman, Michael L. (Brown University) | Wollowski, Michael (Rose-Hulman Institute of Technology) | Neller, Todd W. (Gettysburg College) | Boerkoel, Jim (Harvey Mudd College)
In this article, ten questions about MOOCs (crowdsourced from the recipients of the AAAI and SIGCSE mailing lists) were posed by editors Michael Wollowski, Todd Neller, James Boerkoel to Douglas H. Fisher, Charles Isbell Jr., and Michael Littman โ educators with unique, relevant experiences to lend their perspective on those issues.
Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
Killian, Taylor W. (Harvard University) | Konidaris, George (Brown University) | Doshi-Velez, Finale (Harvard University)
An intriguing application of transfer learning emerges when tasks arise with similar, but not identical, dynamics. Hidden Parameter Markov Decision Processes (HiP-MDP) embed these tasks into a low-dimensional space; given the embedding parameters one can identify the MDP for a particular task. However, the original formulation of HiP-MDP had a critical flaw: the embedding uncertainty was modeled independently of the agent's state uncertainty, requiring an arduous training procedure. In this work, we apply a Gaussian Process latent variable model to jointly model the dynamics and the embedding, leading to a more elegant formulation, one that allows for better uncertainty quantification and thus more robust transfer.
Towards Behavior-Aware Model Learning from Human-Generated Trajectories
Loftin, Robert Tyler (North Carolina State University) | MacGlashan, James (Brown University) | Peng, Bei (Washington State University) | Taylor, Matthew E. (Washington State University) | Littman, Michael L. (Brown University) | Roberts, David L. (North Carolina State University)
Inverse reinforcement learning algorithms recover an unknown reward function for a Markov decision process, based on observations of user behaviors that optimize this reward function. Here we consider the complementary problem of learning the unknown transition dynamics of an MDP based on such observations. We describe the behavior-aware modeling (BAM) algorithm, which learns models of transition dynamics from user generated state-action trajectories. BAM makes assumptions about how users select their actions that are similar to those used in inverse reinforcement learning, and searches for a model that maximizes the probability of the observed actions. The BAM algorithm is based on policy gradient algorithms, essentially reversing the roles of the policy and transition distribution in those algorithms. As a result, BAM is highly flexible, and can be applied to continuous state spaces using a wide variety of model representations. In this preliminary work, we discuss why the model learning problem is interesting, describe algorithms to solve this problem, and discuss directions for future work.
Reconstructing Hidden Permutations Using the Average-Precision (AP) Correlation Statistic
Stefani, Lorenzo De (Brown University) | Epasto, Alessandro (Brown University) | Upfal, Eli (Brown University) | Vandin, Fabio (Univeristy of Padova)
We study the problem of learning probabilistic models for permutations, where the order between highly ranked items in the observed permutations is more reliable (i.e., consistent in different rankings) than the order between lower ranked items, a typical phenomena observed in many applications such as web search results and product ranking. We introduce and study a variant of the Mallows model where the distribution is a function of the widely used Average-Precision (AP) Correlation statistic, instead of the standard Kendallโs tau distance. We present a generative model for constructing samples from this distribution and prove useful properties of that distribution. Using these properties we develop an efficient algorithm that provably computes an asymptotically unbiased estimate of the center permutation, and a faster algorithm that learns with high probability the hidden central permutation for a wide range of the parameters of the model. We complement our theoretical analysis with extensive experiments showing that unsupervised methods based on our model can precisely identify ground-truth clusters of rankings in real-world data. In particular, when compared to the Kendallโs tau based methods, our methods are less affected by noise in low-rank items.
Optimizing Infrastructure Enhancements for Evacuation Planning
Kumar, Kunal (Ghent University) | Romanski, Julia (Brown University) | Hentenryck, Pascal Van (University of Michigan)
With rapid population growth and urbanization, emergency services in various cities around the world worry that the current transportation infrastructure is no longer adequate for large-scale evacuations. This paper considers how to mitigate this issue through infrastructure upgrades, such as the additions of lanes to road segments and the raising of bridges and roads. The paper proposes a MIP model for deciding the most effective infrastructure upgrades as well as a Benders decomposition approach where the master problem jointly plans the upgrades and evacuation routes and the subproblem schedules the evacuation itself. Experimental results demonstrate the practicability of the approach on a real case study, filling a significant need for emergencies services.
Benders Decomposition for Large-Scale Prescriptive Evacuations
Romanski, Julia (Brown University) | Hentenryck, Pascal Van (University of Michigan)
This paper considers prescriptive evacuation planning for a region threatened by a natural disaster such a flood, a wildfire, or a hurricane. It proposes a Benders decomposition that generalizes the two-stage approach proposed in earlier work for convergent evacuation plans. Experimental results show that Benders decomposition provides significant improvements in solution quality in reasonable time: It finds provably optimal solutions to scenarios considered in prior work, closing these instances, and increases the number of evacuees by 10 to 15% on average on more complex flood scenarios.