The paper investigates stochastic resource allocation problems with scarce, reusable resources and non-preemtive, time-dependent, interconnected tasks. This approach is a natural generalization of several standard resource management problems, such as scheduling and transportation problems. First, reactive solutions are considered and defined as control policies of suitably reformulated Markov decision processes (MDPs). We argue that this reformulation has several favorable properties, such as it has finite state and action spaces, it is aperiodic, hence all policies are proper and the space of control policies can be safely restricted. Next, approximate dynamic programming (ADP) methods, such as fitted Q-learning, are suggested for computing an efficient control policy. In order to compactly maintain the cost-to-go function, two representations are studied: hash tables and support vector regression (SVR), particularly, nu-SVRs. Several additional improvements, such as the application of limited-lookahead rollout algorithms in the initial phases, action space decomposition, task clustering and distributed sampling are investigated, too. Finally, experimental results on both benchmark and industry-related data are presented.
We consider closed-loop solutions to stochastic optimization problems of resource allocation type. They concern with the dynamic allocation of reusable resources over time to non-preemtive interconnected tasks with stochastic durations. The aim is to minimize the expected value of a regular performance measure. First, we formulate the problem as a stochastic shortest path problem and argue that our formulation has favorable properties, e.g., it has finite horizon, it is acyclic, thus, all policies are proper, and moreover, the space of control policies can be safely restricted. Then, we propose an iterative solution. Essentially, we apply a reinforcement learning based adaptive sampler to compute a suboptimal control policy. We suggest several approaches to enhance this solution and make it applicable to largescale problems. The main improvements are: (1) the value function is maintained by feature-based support vector regression; (2) the initial exploration is guided by rollout algorithms; (3) the state space is partitioned by clustering the tasks while keeping the precedence constraints satisfied; (4) the action space is decomposed and, consequently, the number of available actions in a state is decreased; and, finally, (5) we argue that the sampling can be effectively distributed among several processors. The effectiveness of the approach is demonstrated by experimental results on both artificial (benchmark) and real-world (industry related) data.
Resource reasoning has been at the heart of many of t he successful At based scheduling systems - yet no attempt has been made to integrate the best techniques from scheduling with the best techniques from xz activity based planning. The reason for wishing to reason about resources in an activity based planner is clear. One of the prime motivations for not considering a particular course of action is that you have insufficient resources with which to carry it out. These resources can vary from people, to money, to space in a car park. Resource reasoning provides a powerful way of pruning the search space and guiding the planner towards a successful plan. Scheduling problems have tended to be dominated by complex resource contentions and relatively simple process plans whereas activity plans have tended to have complex process options with simple resource uses.
A successful plan ensures that all facts that do not need resources are correctly achieved by that plan. A straightforward method for resource allocation is to assign a new or freed resource to any step that is involved in a resource conflict. Suppose that this method needs R resources. Now for all problems with resources N __ R, the same scheduling technique can be applied. As the number of resources decrease, the scheduler has to serialize the plan in line with the resource limitation.
This paper contributes to solve effectively stochastic resource allocation problems known to be NP-Complete. To address this complex resource management problem, previous works on pruning the action space of real-time heuristic search is extended. The pruning is accomplished by using upper and lower bounds on the value function. This way, if an action in a state has its upper bound lower than the lower bound on the value of this state, this action may be pruned in the set of possible optimal actions for the state. This paper extends this previous work by proposing tight bounds for problems where tasks have to be accomplished using limited resources. The marginal revenue bound proposed in this paper compares favorably with another approach which proposes bounds for pruning the action space.