Goto

Collaborating Authors

 arrival rate





Action-Driven Processes for Continuous-Time Control

He, Ruimin, Lin, Shaowei

arXiv.org Machine Learning

Modeling systems that exhibit both continuous and discontinuous state changes presents a significant challenge in machine learning. For instance, biological spiking networks feature the continuous decay of neuron potentials alongside discontinuous spikes, which cause abrupt increases in the potentials of neighboring downstream neurons. Designing appropriate objective functions and applying gradient methods that work with these discontinuities are among the difficulties of working with such systems. Traditionally, ordinary and partial differential equations (ODEs and PDEs) are used to model continuous state changes, while Markov decision processes (MDPs) are employed to capture discrete actions that drive environmental transitions. In this paper, we study Action-Driven Processes (ADPs), also known as generalized semi-Markov processes [12, 5, 16], which unify both types of dynamics within a single framework. With continuous-time states and actions at the core of ADPs, a natural question is whether it is possible to learn optimal policies for action selection using traditional reinforcement learning methods. The control-as-inference tutorial [9] elegantly demonstrated that maximum entropy reinforcement learning can be formulated as minimizing the Kullback-Leibler (KL) divergence between (a) a true trajectory distribution generated by action-state transitions and the policy, and (b) a model trajectory distribution that depends on the reward function.


Near-Optimal Regret-Queue Length Tradeoff in Online Learning for Two-Sided Markets

Yang, Zixian, Varma, Sushil Mahavir, Ying, Lei

arXiv.org Artificial Intelligence

We study a two-sided market, wherein, price-sensitive heterogeneous customers and servers arrive and join their respective queues. A compatible customer-server pair can then be matched by the platform, at which point, they leave the system. Our objective is to design pricing and matching algorithms that maximize the platform's profit, while maintaining reasonable queue lengths. As the demand and supply curves governing the price-dependent arrival rates may not be known in practice, we design a novel online-learning-based pricing policy and establish its near-optimality. In particular, we prove a tradeoff among three performance metrics: $\tilde{O}(T^{1-γ})$ regret, $\tilde{O}(T^{γ/2})$ average queue length, and $\tilde{O}(T^γ)$ maximum queue length for $γ\in (0, 1/6]$, significantly improving over existing results [1]. Moreover, barring the permissible range of $γ$, we show that this trade-off between regret and average queue length is optimal up to logarithmic factors under a class of policies, matching the optimal one as in [2] which assumes the demand and supply curves to be known. Our proposed policy has two noteworthy features: a dynamic component that optimizes the tradeoff between low regret and small queue lengths; and a probabilistic component that resolves the tension between obtaining useful samples for fast learning and maintaining small queue lengths.






Demonstration of effective UCB-based routing in skill-based queues on real-world data

van Kempen, Sanne, Sanders, Jaron, Sloothaak, Fiona, Wolf, Maarten G.

arXiv.org Artificial Intelligence

This paper is about optimally controlling skill-based queueing systems such as data centers, cloud computing networks, and service systems. By means of a case study using a real-world data set, we investigate the practical implementation of a recently developed reinforcement learning algorithm for optimal customer routing. Our experiments show that the algorithm efficiently learns and adapts to changing environments and outperforms static benchmark policies, indicating its potential for live implementation. We also augment the real-world applicability of this algorithm by introducing a new heuristic routing rule to reduce delays. Moreover, we show that the algorithm can optimize for multiple objectives: next to payoff maximization, secondary objectives such as server load fairness and customer waiting time reduction can be incorporated. Tuning parameters are used for balancing inherent performance trade--offs. Lastly, we investigate the sensitivity to estimation errors and parameter tuning, providing valuable insights for implementing adaptive routing algorithms in complex real-world queueing systems.