Goto

Collaborating Authors

 Planning & Scheduling


Policy-Gradient Methods for Planning

Neural Information Processing Systems

Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning -- in the form of a policy-gradient method -- to these domains. Our emphasis is large domains that are infeasible for dynamic programming. Our approach is to construct simple policies, or agents, for each planning task. The result is a general probabilistic temporal planner, named the Factored Policy-Gradient Planner (FPG-Planner), which can handle hundreds of tasks, optimising for probability of success, duration, and resource use.


Off-Road Obstacle Avoidance through End-to-End Learning

Neural Information Processing Systems

We describe a vision-based obstacle avoidance system for off-road mobile robots. The system is trained from end to end to map raw input images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training runs collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two forwardpointing wireless color cameras. A remote computer processes the video and controls the robot via radio. The learning system is a large 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolution images. The robot exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.


Policy-Gradient Methods for Planning

Neural Information Processing Systems

Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning -- in the form of a policy-gradient method -- to these domains. Our emphasis is large domains that are infeasible for dynamic programming. Our approach is to construct simple policies, or agents, for each planning task. The result is a general probabilistic temporal planner, named the Factored Policy-Gradient Planner (FPG-Planner), which can handle hundreds of tasks, optimising for probability of success, duration, and resource use.


Off-Road Obstacle Avoidance through End-to-End Learning

Neural Information Processing Systems

We describe a vision-based obstacle avoidance system for off-road mobile robots.The system is trained from end to end to map raw input images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training runs collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two forwardpointing wirelesscolor cameras. A remote computer processes the video and controls the robot via radio. The learning system is a large 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolutionimages. The robot exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.


Policy-Gradient Methods for Planning

Neural Information Processing Systems

Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning -- in the form of a policy-gradient method -- to these domains. Our emphasis is large domains that are infeasible for dynamic programming. Our approach isto construct simple policies, or agents, for each planning task. The result is a general probabilistic temporal planner, named the Factored Policy-Gradient Planner (FPG-Planner), which can handle hundreds of tasks, optimising for probability of success, duration, and resource use.


Understanding Algorithm Performance on an Oversubscribed Scheduling Application

Journal of Artificial Intelligence Research

The best performing algorithms for a particular oversubscribed scheduling application, Air Force Satellite Control Network (AFSCN) scheduling, appear to have little in common. Yet, through careful experimentation and modeling of performance in real problem instances, we can relate characteristics of the best algorithms to characteristics of the application. In particular, we find that plateaus dominate the search spaces (thus favoring algorithms that make larger changes to solutions) and that some randomization in exploration is critical to good performance (due to the lack of gradient information on the plateaus). Based on our explanations of algorithm performance, we develop a new algorithm that combines characteristics of the best performers; the new algorithm's performance is better than the previous best. We show how hypothesis driven experimentation and search modeling can both explain algorithm performance and motivate the design of a new algorithm.


Modelling Mixed Discrete-Continuous Domains for Planning

Journal of Artificial Intelligence Research

In this paper we present pddl+, a planning domain description language for modelling mixed discrete-continuous planning domains. We describe the syntax and modelling style of pddl+, showing that the language makes convenient the modelling of complex time-dependent effects. We provide a formal semantics for pddl+ by mapping planning instances into constructs of hybrid automata. Using the syntax of HAs as our semantic model we construct a semantic mapping to labelled transition systems to complete the formal interpretation of pddl+ planning instances. An advantage of building a mapping from pddl+ to HA theory is that it forms a bridge between the Planning and Real Time Systems research communities. One consequence is that we can expect to make use of some of the theoretical properties of HAs. For example, for a restricted class of HAs the Reachability problem (which is equivalent to Plan Existence) is decidable. pddl+ provides an alternative to the continuous durative action model of pddl2.1, adding a more flexible and robust model of time-dependent behaviour.


Engineering Benchmarks for Planning: the Domains Used in the Deterministic Part of IPC-4

Journal of Artificial Intelligence Research

In a field of research about general reasoning mechanisms, it is essential to have appropriate benchmarks. Ideally, the benchmarks should reflect possible applications of the developed technology. In AI Planning, researchers more and more tend to draw their testing examples from the benchmark collections used in the International Planning Competition (IPC). In the organization of (the deterministic part of) the fourth IPC, IPC-4, the authors therefore invested significant effort to create a useful set of benchmarks. They come from five different (potential) real-world applications of planning: airport ground traffic control, oil derivative transportation in pipeline networks, model-checking safety properties, power supply restoration, and UMTS call setup. Adapting and preparing such an application for use as a benchmark in the IPC involves, at the time, inevitable (often drastic) simplifications, as well as careful choice between, and engineering of, domain encodings. For the first time in the IPC, we used compilations to formulate complex domain features in simple languages such as STRIPS, rather than just dropping the more interesting problem constraints in the simpler language subsets. The article explains and discusses the five application domains and their adaptation to form the PDDL test suites used in IPC-4. We summarize known theoretical results on structural properties of the domains, regarding their computational complexity and provable properties of their topology under the h+ function (an idealized version of the relaxed plan heuristic). We present new (empirical) results illuminating properties such as the quality of the most wide-spread heuristic functions (planning graph, serial planning graph, and relaxed plan), the growth of propositional representations over instance size, and the number of actions available to achieve each fact; we discuss these data in conjunction with the best results achieved by the different kinds of planners participating in IPC-4.


Temporal Planning using Subgoal Partitioning and Resolution in SGPlan

Journal of Artificial Intelligence Research

In this paper, we present the partitioning of mutual-exclusion (mutex) constraints in temporal planning problems and its implementation in the SGPlan4 planner. Based on the strong locality of mutex constraints observed in many benchmarks of the Fourth International Planning Competition (IPC4), we propose to partition the constraints of a planning problem into groups based on their subgoals. Constraint partitioning leads to significantly easier subproblems that are similar to the original problem and that can be efficiently solved by the same planner with some modifications to its objective function. We present a partition-and-resolve strategy that looks for locally optimal subplans in constraint-partitioned temporal planning subproblems and that resolves those inconsistent global constraints across the subproblems. We also discuss some implementation details of SGPlan4, which include the resolution of violated global constraints, techniques for handling producible resources, landmark analysis, path finding and optimization, search-space reduction, and modifications of Metric-FF when used as a basic planner in SGPlan4. Last, we show results on the sensitivity of each of these techniques in quality-time trade-offs and experimentally demonstrate that SGPlan4 is effective for solving the IPC3 and IPC4 benchmarks.


How the Landscape of Random Job Shop Scheduling Instances Depends on the Ratio of Jobs to Machines

Journal of Artificial Intelligence Research

We characterize the search landscape of random instances of the job shop scheduling problem (JSP). Specifically, we investigate how the expected values of (1) backbone size, (2) distance between near-optimal schedules, and (3) makespan of random schedules vary as a function of the job to machine ratio (N/M). For the limiting cases N/M approaches 0 and N/M approaches infinity we provide analytical results, while for intermediate values of N/M we perform experiments. We prove that as N/M approaches 0, backbone size approaches 100%, while as N/M approaches infinity the backbone vanishes. In the process we show that as N/M approaches 0 (resp. N/M approaches infinity), simple priority rules almost surely generate an optimal schedule, providing theoretical evidence of an "easy-hard-easy" pattern of typical-case instance difficulty in job shop scheduling. We also draw connections between our theoretical results and the "big valley" picture of JSP landscapes.