balanced policy
Arrival Control in Quasi-Reversible Queueing Systems: Optimization and Reinforcement Learning
In this paper, we introduce a versatile scheme for optimizing the arrival rates of quasi-reversible queueing systems. We first propose an alternative definition of quasi-reversibility that encompasses reversibility and highlights the importance of the definition of customer classes. In a second time, we introduce balanced arrival control policies, which generalize the notion of balanced arrival rates introduced in the context of Whittle networks, to the much broader class of quasi-reversible queueing systems. We prove that supplementing a quasi-reversible queueing system with a balanced arrival-control policy preserves the quasi-reversibility, and we specify the form of the stationary measures. We revisit two canonical examples of quasi-reversible queueing systems, Whittle networks and order-independent queues. Lastly, we focus on the problem of admission control and leverage our results in the frameworks of optimization and reinforcement learning.
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.76)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
Lower Bounds for Policy Iteration on Multi-action MDPs
Ashutosh, Kumar, Consul, Sarthak, Dedhia, Bhishma, Khirwadkar, Parthasarathi, Shah, Sahil, Kalyanakrishnan, Shivaram
Policy Iteration (PI) is a classical family of algorithms to compute an optimal policy for any given Markov Decision Problem (MDP). The basic idea in PI is to begin with some initial policy and to repeatedly update the policy to one from an improving set, until an optimal policy is reached. Different variants of PI result from the (switching) rule used for improvement. An important theoretical question is how many iterations a specified PI variant will take to terminate as a function of the number of states $n$ and the number of actions $k$ in the input MDP. While there has been considerable progress towards upper-bounding this number, there are fewer results on lower bounds. In particular, existing lower bounds primarily focus on the special case of $k = 2$ actions. We devise lower bounds for $k \geq 3$. Our main result is that a particular variant of PI can take $\Omega(k^{n/2})$ iterations to terminate. We also generalise existing constructions on $2$-action MDPs to scale lower bounds by a factor of $k$ for some common deterministic variants of PI, and by $\log(k)$ for corresponding randomised variants.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > India > Maharashtra > Mumbai (0.04)
Welfare of Sequential Allocation Mechanisms for Indivisible Goods
Aziz, Haris, Kalinowski, Thomas, Walsh, Toby, Xia, Lirong
Sequential allocation is a simple and attractive mechanism for the allocation of indivisible goods. Agents take turns, according to a policy, to pick items. Sequential allocation is guaranteed to return an allocation which is efficient but may not have an optimal social welfare. We consider therefore the relation between welfare and efficiency. We study the (computational) questions of what welfare is possible or necessary depending on the choice of policy. We also consider a novel control problem in which the chair chooses a policy to improve social welfare.
- Oceania > Australia (0.05)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Possible and Necessary Allocations via Sequential Mechanisms
Aziz, Haris (NICTA and University of New South Wales) | Walsh, Toby (NICTA and University of New South Wales) | Xia, Lirong (Rensselaer Polytechnic Institute)
A simple mechanism for allocating indivisible resources is sequential allocation in which agents take turns to pick items. We focus on possible and necessary allocation problems, checking whether allocations of a given form occur in some or all mechanisms for several commonly used classes of sequential allocation mechanisms. In particular, we consider whether a given agent receives a given item, a set of items, or a subset of items for natural classes of sequential allocation mechanisms: balanced, recursively balanced, balanced alternation, and strict alternation. We present characterizations of the allocations that result respectively from the classes, which extend the well-known characterization by Brams and King [2005] for policies without restrictions. In addition, we examine the computational complexity of possible and necessary allocation problems for these classes.
- Oceania > Australia (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)