Khandelwal, Piyush
iCORPP: Interleaved Commonsense Reasoning and Probabilistic Planning on Robots
Zhang, Shiqi, Khandelwal, Piyush, Stone, Peter
Robot sequential decision-making in the real world is a challenge because it requires the robots to simultaneously reason about the current world state and dynamics, while planning actions to accomplish complex tasks. On the one hand, declarative languages and reasoning algorithms well support representing and reasoning with commonsense knowledge. But these algorithms are not good at planning actions toward maximizing cumulative reward over a long, unspecified horizon. On the other hand, probabilistic planning frameworks, such as Markov decision processes (MDPs) and partially observable MDPs (POMDPs), well support planning to achieve long-term goals under uncertainty. But they are ill-equipped to represent or reason about knowledge that is not directly related to actions. In this article, we present a novel algorithm, called iCORPP, to simultaneously estimate the current world state, reason about world dynamics, and construct task-oriented controllers. In this process, robot decision-making problems are decomposed into two interdependent (smaller) subproblems that focus on reasoning to "understand the world" and planning to "achieve the goal" respectively. Contextual knowledge is represented in the reasoning component, which makes the planning component epistemic and enables active information gathering. The developed algorithm has been implemented and evaluated both in simulation and on real robots using everyday service tasks, such as indoor navigation, dialog management, and object delivery. Results show significant improvements in scalability, efficiency, and adaptiveness, compared to competitive baselines including handcrafted action policies.
An Empirical Comparison of PDDL-based and ASP-based Task Planners
Jiang, Yuqian, Zhang, Shiqi, Khandelwal, Piyush, Stone, Peter
General purpose planners enable AI systems to solve many different types of planning problems. However, many different planners exist, each with different strengths and weaknesses, and there are no general rules for which planner would be best to apply to a given problem. In this paper, we empirically compare the performance of state-of-the-art planners that use either the Planning Domain Description Language (PDDL), or Answer Set Programming (ASP) as the underlying action language. PDDL is designed for automated planning, and PDDL-based planners are widely used for a variety of planning problems. ASP is designed for knowledge-intensive reasoning, but can also be used for solving planning problems. Given domain encodings that are as similar as possible, we find that PDDL-based planners perform better on problems with longer solutions, and ASP-based planners are better on tasks with a large number of objects or in which complex reasoning is required to reason about action preconditions and effects. The resulting analysis can inform selection among general purpose planning systems for a particular domain.
Designing Better Playlists with Monte Carlo Tree Search
Liebman, Elad (The University of Texas at Austin) | Khandelwal, Piyush (The University of Texas at Austin) | Saar-Tsechansky, Maytal (The University of Texas at Austin) | Stone, Peter (The University of Texas at Austin)
In recent years, there has been growing interest in the study of automated playlist generation โ music recommender systems that focus on modeling preferences over song sequences rather than on individual songs in isolation. This paper addresses this problem by learning personalized models on the fly of both song and transition preferences, uniquely tailored to each userโs musical tastes. Playlist recommender systems typically include two main components: i) a preference-learning component, and ii) a planning component for selecting the next song in the playlist sequence. While there has been much work on the former, very little work has been devoted to the latter. This paper bridges this gap by focusing on the planning aspect of playlist generation within the context of DJ-MC, our playlist recommendation application. This paper also introduces a new variant of playlist recommendation, which incorporates the notion of diversity and novelty directly into the reward model. We empirically demonstrate that the proposed planning approach significantly improves performance compared to the DJ-MC baseline in two playlist recommendation settings, increasing the usability of the framework in real world settings.
Dynamically Constructed (PO)MDPs for Adaptive Robot Planning
Zhang, Shiqi (Cleveland State University) | Khandelwal, Piyush (The University of Texas at Austin) | Stone, Peter (The University of Texas at Austin)
To operate in human-robot coexisting environments, intelligent robots need to simultaneously reason with commonsense knowledge and plan under uncertainty. Markov decision processes (MDPs) and partially observable MDPs (POMDPs), are good at planning under uncertainty toward maximizing long-term rewards; P-LOG, a declarative programming language under Answer Set semantics, is strong in commonsense reasoning. In this paper, we present a novel algorithm called iCORPP to dynamically reason about, and construct (PO)MDPs using P-LOG. iCORPP successfully shields exogenous domain attributes from (PO)MDPs, which limits computational complexity and enables (PO)MDPs to adapt to the value changes these attributes produce. We conduct a number of experimental trials using two example problems in simulation and demonstrate iCORPP on a real robot. Results show significant improvements compared to competitive baselines.
Leading the Way: An Efficient Multi-Robot Guidance System
Khandelwal, Piyush (The University of Texas at Austin) | Stone, Peter (The University of Texas at Austin)
Prior approaches to human guidance using robots inside a building have typically been limited to a single robot guide that navigates a human from start to goal. However, due to their limited mobility, the robot is often unable to keep up with the human's natural speed. In contrast, this paper addresses this difference in mobility between robots and people by presenting an approach that uses multiple robots to guide a human. Our approach uses a compact topological graph representation to formulate the multi-robot guidance problem as a Markov Decision Process (MDP). Using a model of human motion in the presence of guiding robots, we define the transition function for this MDP. We solve the MDP using Value Iteration to obtain an optimal policy for placing robots and evaluate this policy.
Planning in Action Language BC while Learning Action Costs for Mobile Robots
Khandelwal, Piyush (The University of Texas at Austin) | Yang, Fangkai (The University of Texas at Austin) | Leonetti, Matteo (The University of Texas at Austin) | Lifschitz, Vladimir (The University of Texas at Austin) | Stone, Peter (The University of Texas at Austin)
The action language BC provides an elegant way of formalizing dynamic domains which involve indirect effects of actions and recursively defined fluents. In complex robot task planning domains, it may be necessary for robots to plan with incomplete information, and reason about indirect or recursive action effects. In this paper, we demonstrate how BC can be used for robot task planning to solve these issues. Additionally, action costs are incorporated with planning to produce optimal plans, and we estimate these costs from experience making planning adaptive. This paper presents the first application of BC on a real robot in a realistic domain, which involves human-robot interaction for knowledge acquisition, optimal plan generation to minimize navigation time, and learning for adaptive planning.