ID policy (with reassignment) is asymptotically optimal for heterogeneous weakly-coupled MDPs
Zhang, Xiangcheng, Hong, Yige, Wang, Weina
–arXiv.org Artificial Intelligence
Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, a natural adaptation of the ID policy, although originally proposed for a homogeneous special case of WCMDPs, in fact achieves an $O(1/\sqrt{N})$ optimality gap in long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our techniques highlight the construction of a novel projection-based Lyapunov function, which witnesses the convergence of rewards and costs to an optimal region in the presence of heterogeneity.
arXiv.org Artificial Intelligence
Feb-9-2025
- Country:
- North America > United States
- Massachusetts (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- New York > New York County
- New York City (0.04)
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- United Kingdom > England
- North America > United States
- Genre:
- Overview (0.92)
- Industry:
- Health & Medicine (0.67)