Brafman, Ronen
PTDRL: Parameter Tuning using Deep Reinforcement Learning
Goldsztejn, Elias, Feiner, Tal, Brafman, Ronen
In their work, the context is a function Abstractly, a navigation system: C: X ฮ A maps the of the lidar inputs. They use change-point-detection [22] state and parameter space to the action space. The state X to segment human-guided navigation trajectories into a prespecified is represented by the robot sensory inputs and information number of contexts. The robot recognizes its current about the world, such as the cost-map and next way-point. Figure 1: Original and reconstructed cost-maps of a physical experiment. The reconstruction captures the main details of the original cost-map, showing that the learnt latent space in the simulation can be used for the real world. The parameters space ฮ is comprised of optimization parameters of the navigation system, robot constrains, etc. The action space A is a velocity vector (e.g., linear and angular Figure 1: A 3D representation of the value function at different velocity).
Qualitative Planning under Partial Observability in Multi-Agent Domains
Brafman, Ronen (Ben-Gurion University) | Shani, Guy (Ben Gurion University) | Zilberstein, Shlomo (University of Massachusetts)
Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more โclassicalโ in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms.
A Multi-Path Compilation Approach to Contingent Planning
Brafman, Ronen (Ben Gurion University) | Shani, Guy (Ben Gurion University)
We describe a new sound and complete method forcompiling contingent planning problems with sensingactions into classical planning. Our method encodesconditional plans within a linear, classicalplan. This allows our planner, MPSR, to reasonabout multiple future outcomes of sensing actions,and makes it less susceptible to dead-ends. MPRS,however, generates very large classical planningproblems. To overcome this, we use an incompletevariant of the method, based on state sampling,within an online replanner. On most currentdomains, MPSR finds plans faster, although itsplans are often longer. But on a new challengingvariant of Wumpus with dead-ends, it finds smallerplans, faster, and scales much better.
A Multi-Path Compilation Approach to Contingent Planning
Brafman, Ronen (Ben Gurion University) | Shani, Guy (Ben Gurion University)
We describe a new sound and complete method for compiling contingentplanning problems with sensing actions into classical planning.Our method encodes conditional plans within a linear, classical plan.This allows our planner, MPSR, to reason about multiple future outcomes of sensingactions, and makes it less susceptible to dead-ends.MPRS, however, generates very large classical planningproblems. To overcome this, we use an incomplete variantof the method, based on state sampling, within an online replanner. On most current domains, MPSR finds plans faster, although its plans are often longer. But on a new challenging variant of Wumpus with dead-ends,it finds smaller plans, faster, and scales better.
Planning for Operational Control Systems with Predictable Exogenous Events
Brafman, Ronen (Ben-Gurion University of the Negev) | Domshlak, Carmel (Technion - Israel Institute of Technology) | Engel, Yagil (IBM Research) | Feldman, Zohar (IBM Research)
Various operational control systems (OCS) are naturally modeled as Markov Decision Processes. OCS often enjoy access to predictions of future events that have substantial impact on their operations. For example, reliable forecasts of extreme weather conditions are widely available, and such events can affect typical request patterns for customer response management systems, the flight and service time of airplanes, or the supply and demand patterns for electricity. The space of exogenous events impacting OCS can be very large, prohibiting their modeling within the MDP; moreover, for many of these exogenous events there is no useful predictive, probabilistic model. Realtime predictions, however, possibly with a short lead-time, are often available. In this work we motivate a model which combines offline MDP infinite horizon planning with realtime adjustments given specific predictions of future exogenous events, and suggest a framework in which such predictions are captured and trigger real-time planning problems. We propose a number of variants of existing MDP solution algorithms, adapted to this context, and evaluate them empirically.
Preference Handling - An Introductory Tutorial
Brafman, Ronen (Ben-Gurion University) | Domshlak, Carmel
We present a tutorial introduction to the area of preference handling - one of the core issues in the design of any system that automates or supports decision making. The main goal of this tutorial is to provide a framework, or perspective, within which current work on preference handling -representation, reasoning, and elicitation - can be understood. Our intention is not to provide a technical description of the diverse methods used, but rather, to provide a general perspective on the problem and its varied solutions and to highlight central ideas and techniques.
Preference Handling - An Introductory Tutorial
Brafman, Ronen (Ben-Gurion University) | Domshlak, Carmel
Early work in AI focused on the notion of a goal--an explicit target that must be achieved--and this paradigm is still dominant in AI problem solving. But as application domains become more complex and realistic, it is apparent that the dichotomic notion of a goal, while adequate for certain puzzles, is too crude in general. The problem is that in many contemporary application domains, for example, information retrieval from large databases or the web, or planning in complex domains, the user has little knowledge about the set of possible solutions or feasible items, and what she or he typically seeks is the best that's out there. But since the user does not know what is the best achievable plan or the best available document or product, he or she typically cannot characterize it or its properties specifically. As a result, the user will end up either asking for an unachievable goal, getting no solution in response, or asking for too little, obtaining a solution that can be substantially improved. Of course, the user can gradually adjust the stated goals. This, however, is not a very appealing mode of interaction because the space of alternative solutions in such applications can be combinatorially huge, or even infinite. Moreover, such incremental goal refinement is simply infeasible when the goal must be supplied offline, as in the case of autonomous agents (whether on the web or on Mars).