Europe
Mean field for Markov Decision Processes: from Discrete to Continuous Optimization
Gast, Nicolas, Gaujal, Bruno, Boudec, Jean-Yves Le
We study the convergence of Markov Decision Processes made of a large number of objects to optimization problems on ordinary differential equations (ODE). We show that the optimal reward of such a Markov Decision Process, satisfying a Bellman equation, converges to the solution of a continuous Hamilton-Jacobi-Bellman (HJB) equation based on the mean field approximation of the Markov Decision Process. We give bounds on the difference of the rewards, and a constructive algorithm for deriving an approximating solution to the Markov Decision Process from a solution of the HJB equations. We illustrate the method on three examples pertaining respectively to investment strategies, population dynamics control and scheduling in queues are developed. They are used to illustrate and justify the construction of the controlled ODE and to show the gain obtained by solving a continuous HJB equation rather than a large discrete Bellman equation.
Studying Properties of Czech Complex Sentences from an Annotated Corpus
Kubon, Vladislav (Charles University in Prague) | Lopatkova, Marketa (Charles University in Prague)
The paper deals with the problem of an analysis of complex sentences in Czech on the basis of manually annotated data. The availability of a specialized corpus explicitly describing mutual relationships between segments and clauses in Czech complex sentences, together with the availability of a thoroughly syntactically annotated corpus, the Prague Dependency Treebank, provide a solid background for linguistic investigation. The paper presents quantitative, linguistic and structural observations which provide a number of clues for building an algorithm for analyzing a structure of complex sentences in the future.
Visual Programming of Plan Dynamics Using Constraints and Landmarks
Porteous, Julie (Teesside University) | Teutenberg, Jonathan (Teesside University) | Pizzi, David (Teesside University) | Cavazza, Marc (Teesside University)
In recent years, there has been considerable interest in the use of planning techniques in the area of new media. Many traditional planning notions no longer apply in the context of these applications. In particular, it can be difficult to answer the important question of what constitutes a good plan for the domain, but there is an emerging consensus that plan dynamics play an important role. As a consequence, it is important to support representation of such aspects. Our solution is to introduce a meta-level of representation that is an abstraction of the domain with respect to both time and causality, and to develop a visual representation of this in the form of a narrative arc. This visual representation can then be used in a visual programming approach to the exploration and specification of plan dynamics. In the paper we outline this approach to meta-level representation using constraints along with the visual programming interface we have developed. We illustrate the approach with examples of visual programming in the development of an interactive entertainment system based on Shakespeare's play ``The Merchant of Venice''
Planning Problems for Social Robots
Tipaldi, Gian Diego (University of Freiburg) | Arras, Kai Oliver (University of Freiburg)
As robots enter environments that they share with people, human-aware planning and interaction become key tasks to be addressed. For doing so, robots need to reason about the places and times when and where humans are engaged into which activity and plan their actions accordingly. In this paper, we first address this issue by learning a nonhomogenous spatial Poisson process whose rate function encodes the occurrence probability of human activities in space and time. We then present two planning problems for human robot interaction in social environments. The first one is the maximum encounter probability planning problem, where a robot aims to find the path along which the probability of encountering a person is maximized. We are interested in two versions of this problem, with deadlines or with a certainty quota. The second one is the minimum interference coverage problem, where a robot performs a coverage task in a socially compatible way by reducing the hindrance or annoyance caused to people. An example is a noisy vacuum robot that has to cover the whole apartment having learned that at lunch time the kitchen is a bad place to clean. Formally, the problems are time dependent variants of known planning problems: MDPs and price collecting TSP for the first problem and the asymmetric TSP for the second. The challenge is that the cost functions of the arcs and nodes vary with time, and that execution time is more important that optimality, given the real-time constraints in robotic systems. We present experimental results using variants of known planners and formulate the problems as benchmarks to the community.
A Polynomial All Outcome Determinization for Probabilistic Planning
Keller, Thomas (University of Freiburg) | Eyerich, Patrick (University of Freiburg)
Most predominant approaches in probabilistic planning utilize techniques from the more thoroughly investigated field of classical planning by determinizing the problem at hand. In this paper, we present a method to map probabilistic operators to an equivalent set of probabilistic operators in a novel normal form, requiring polynomial time and space. From this, we directly derive a determinization which can be used for, e.g., replanning strategies incorporating a classical planning system. Unlike previously described all outcome determinizations, the number of deterministic operators is not exponentially but polynomially bounded in the number of parallel probabilistic effects, enabling the use of more sophisticated determinization-based techniques in the future.
An Effective Approach to Realizing Planning Programs
Gerevini, Alfonso (University of Brescia) | Patrizi, Fabio (Imperial College) | Saetti, Alessandro (University of Brescia)
Planning programs are loose, high-level, declarative representations of the behavior of agents acting in a domain and following a path of goals to achieve. Such programs are specified through transition systems that can include cycles and decisions to make at certain points. We investigate a new effective approach for solving the problem of realizing a planning program, i.e., informally, for finding and combining a collection of plans that guarantee the planning program executability. We focus on deterministic domains and propose a general algorithm that solves the problem exploiting a planning technique handling goal constraints and preferences. A preliminary experimental analysis indicates that our approach dramatically outperforms the existing method based on formal verification and synthesis techniques.
A Complete Algorithm for Generating Landmarks
Bonet, Blai (Universidad Simon Bolivar) | Castillo, Julio (Universidad Simon Bolivar)
A collection of landmarks is complete if the cost of a minimum-cost hitting set equals h + and there is a minimum-cost hitting set that is an optimal relaxed plan. We present an algorithm for generating a complete collection of landmarks and we show that this algorithm can be extended into effective polytime heuristics for optimal and satisficing planning. The new admissible heuristics are compared with current state-of-the-art heuristics for optimal planning on benchmark problems from the IPC.
Dynamic State-Space Partitioning in External-Memory Graph Search
Zhou, Rong (Palo Alto Research Center) | Hansen, Eric A. (Mississippi State University)
The scalability of optimal sequential planning can be improved by using external-memory graph search. State-of-the-art external-memory graph search algorithms rely on a state-space projection function, or hash function, that partitions the stored nodes of the state-space search graph into groups of nodes that are stored as separate files on disk. Search performance depends on properties of the partition; whether the number of unique nodes in a file always fits in RAM, the number of files into which the nodes of the state-space graph are partitioned, and how well the partition captures local structure in the graph. Previous work relies on a static partition of the state space, but it can be difficult for a static partition to simultaneously satisfy all of these criteria. We introduce a method for dynamic partitioning and show that it leads to improved search performance in solving STRIPS planning problems.
Markov Decision Processes with Ordinal Rewards: Reference Point-Based Preferences
In a standard Markov decision process (MDP), rewards are assumed to be precisely known and of quantitative nature. This can be a too strong hypothesis in some situations. When rewards can really be modeled numerically, specifying the reward function is often difficult as it is a cognitively-demanding and/or time-consuming task. Besides, rewards can sometimes be of qualitative nature as when they represent qualitative risk levels for instance. In those cases, it is problematic to use directly standard MDPs and we propose instead to resort to MDPs with ordinal rewards. Only a total order over rewards is assumed to be known. In this setting, we explain how an alternative way to define expressive and interpretable preferences using reference points can be exploited.
Planning to Perceive: Exploiting Mobility for Robust Object Detection
Velez, Javier (Massachusetts Institute of Technology) | Hemann, Garrett (Massachusetts Institute of Technology) | Huang, Albert S. (Massachusetts Institute of Technology) | Posner, Ingmar (Department of Engineering Science, University of Oxford) | Roy, Nicholas (Massachusetts Institute of Technology)
Consider the task of a mobile robot autonomously navigating through an environment while detecting and mapping objects of interest using a noisy object detector. The robot must reach its destination in a timely manner, but is rewarded for correctly detecting recognizable objects to be added to the map, and penalized for false alarms. However, detector performance typically varies with vantage point, so the robot benefits from planning trajectories which maximize the efficacy of the recognition system. This work describes an online, any-time planning framework enabling the active exploration of possible detections provided by an off-the-shelf object detector. We present a probabilistic approach where vantage points are identified which provide a more informative view of a potential object. The agent then weighs the benefit of increasing its confidence against the cost of taking a detour to reach each identified vantage point. The system is demonstrated to significantly improve detection and trajectory length in both simulated and real robot experiments.