arrival rate
AllSim: Simulating and Benchmarking Resource Allocation Policies in Multi-User Systems
Numerous real-world systems, ranging from healthcare to energy grids, involve users competing for finite and potentially scarce resources. Designing policies for repeated resource allocation in such real-world systems is challenging for many reasons, including the changing nature of user types and their (possibly urgent) need for resources. Researchers have developed numerous machine learning solutions for determining repeated resource allocation policies in these challenging settings. However, a key limitation has been the absence of good methods and test-beds for benchmarking these policies; almost all resource allocation policies are benchmarked in environments which are either completely synthetic or do not allow any deviation from historical data. In this paper we introduce AllSim, which is a benchmarking environment for realistically simulating the impact and utility of policies for resource allocation in systems in which users compete for such scarce resources. Building such a benchmarking environment is challenging because it needs to successfully take into account the entire collective of potential users and the impact a resource allocation policy has on all the other users in the system. AllSim's benchmarking environment is modular (each component being parameterized individually), learnable (informed by historical data), and customizable (adaptable to changing conditions). These, when interacting with an allocation policy, produce a dataset of simulated outcomes for evaluation and comparison of such policies. We believe AllSim is an essential step towards a more systematic evaluation of policies for scarce resource allocation compared to current approaches for benchmarking such methods.
Action-Driven Processes for Continuous-Time Control
Modeling systems that exhibit both continuous and discontinuous state changes presents a significant challenge in machine learning. For instance, biological spiking networks feature the continuous decay of neuron potentials alongside discontinuous spikes, which cause abrupt increases in the potentials of neighboring downstream neurons. Designing appropriate objective functions and applying gradient methods that work with these discontinuities are among the difficulties of working with such systems. Traditionally, ordinary and partial differential equations (ODEs and PDEs) are used to model continuous state changes, while Markov decision processes (MDPs) are employed to capture discrete actions that drive environmental transitions. In this paper, we study Action-Driven Processes (ADPs), also known as generalized semi-Markov processes [12, 5, 16], which unify both types of dynamics within a single framework. With continuous-time states and actions at the core of ADPs, a natural question is whether it is possible to learn optimal policies for action selection using traditional reinforcement learning methods. The control-as-inference tutorial [9] elegantly demonstrated that maximum entropy reinforcement learning can be formulated as minimizing the Kullback-Leibler (KL) divergence between (a) a true trajectory distribution generated by action-state transitions and the policy, and (b) a model trajectory distribution that depends on the reward function.
Near-Optimal Regret-Queue Length Tradeoff in Online Learning for Two-Sided Markets
Yang, Zixian, Varma, Sushil Mahavir, Ying, Lei
We study a two-sided market, wherein, price-sensitive heterogeneous customers and servers arrive and join their respective queues. A compatible customer-server pair can then be matched by the platform, at which point, they leave the system. Our objective is to design pricing and matching algorithms that maximize the platform's profit, while maintaining reasonable queue lengths. As the demand and supply curves governing the price-dependent arrival rates may not be known in practice, we design a novel online-learning-based pricing policy and establish its near-optimality. In particular, we prove a tradeoff among three performance metrics: $\tilde{O}(T^{1-ฮณ})$ regret, $\tilde{O}(T^{ฮณ/2})$ average queue length, and $\tilde{O}(T^ฮณ)$ maximum queue length for $ฮณ\in (0, 1/6]$, significantly improving over existing results [1]. Moreover, barring the permissible range of $ฮณ$, we show that this trade-off between regret and average queue length is optimal up to logarithmic factors under a class of policies, matching the optimal one as in [2] which assumes the demand and supply curves to be known. Our proposed policy has two noteworthy features: a dynamic component that optimizes the tradeoff between low regret and small queue lengths; and a probabilistic component that resolves the tension between obtaining useful samples for fast learning and maintaining small queue lengths.
Demonstration of effective UCB-based routing in skill-based queues on real-world data
van Kempen, Sanne, Sanders, Jaron, Sloothaak, Fiona, Wolf, Maarten G.
This paper is about optimally controlling skill-based queueing systems such as data centers, cloud computing networks, and service systems. By means of a case study using a real-world data set, we investigate the practical implementation of a recently developed reinforcement learning algorithm for optimal customer routing. Our experiments show that the algorithm efficiently learns and adapts to changing environments and outperforms static benchmark policies, indicating its potential for live implementation. We also augment the real-world applicability of this algorithm by introducing a new heuristic routing rule to reduce delays. Moreover, we show that the algorithm can optimize for multiple objectives: next to payoff maximization, secondary objectives such as server load fairness and customer waiting time reduction can be incorporated. Tuning parameters are used for balancing inherent performance trade--offs. Lastly, we investigate the sensitivity to estimation errors and parameter tuning, providing valuable insights for implementing adaptive routing algorithms in complex real-world queueing systems.