arrival rate
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > New Jersey > Essex County > Newark (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts (0.04)
- Asia > India > Rajasthan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Modeling & Simulation (0.67)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts (0.04)
- Asia > India > Rajasthan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.68)
Action-Driven Processes for Continuous-Time Control
Modeling systems that exhibit both continuous and discontinuous state changes presents a significant challenge in machine learning. For instance, biological spiking networks feature the continuous decay of neuron potentials alongside discontinuous spikes, which cause abrupt increases in the potentials of neighboring downstream neurons. Designing appropriate objective functions and applying gradient methods that work with these discontinuities are among the difficulties of working with such systems. Traditionally, ordinary and partial differential equations (ODEs and PDEs) are used to model continuous state changes, while Markov decision processes (MDPs) are employed to capture discrete actions that drive environmental transitions. In this paper, we study Action-Driven Processes (ADPs), also known as generalized semi-Markov processes [12, 5, 16], which unify both types of dynamics within a single framework. With continuous-time states and actions at the core of ADPs, a natural question is whether it is possible to learn optimal policies for action selection using traditional reinforcement learning methods. The control-as-inference tutorial [9] elegantly demonstrated that maximum entropy reinforcement learning can be formulated as minimizing the Kullback-Leibler (KL) divergence between (a) a true trajectory distribution generated by action-state transitions and the policy, and (b) a model trajectory distribution that depends on the reward function.
- Asia > Singapore (0.14)
- Europe > Czechia > Prague (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
Near-Optimal Regret-Queue Length Tradeoff in Online Learning for Two-Sided Markets
Yang, Zixian, Varma, Sushil Mahavir, Ying, Lei
We study a two-sided market, wherein, price-sensitive heterogeneous customers and servers arrive and join their respective queues. A compatible customer-server pair can then be matched by the platform, at which point, they leave the system. Our objective is to design pricing and matching algorithms that maximize the platform's profit, while maintaining reasonable queue lengths. As the demand and supply curves governing the price-dependent arrival rates may not be known in practice, we design a novel online-learning-based pricing policy and establish its near-optimality. In particular, we prove a tradeoff among three performance metrics: $\tilde{O}(T^{1-γ})$ regret, $\tilde{O}(T^{γ/2})$ average queue length, and $\tilde{O}(T^γ)$ maximum queue length for $γ\in (0, 1/6]$, significantly improving over existing results [1]. Moreover, barring the permissible range of $γ$, we show that this trade-off between regret and average queue length is optimal up to logarithmic factors under a class of policies, matching the optimal one as in [2] which assumes the demand and supply curves to be known. Our proposed policy has two noteworthy features: a dynamic component that optimizes the tradeoff between low regret and small queue lengths; and a probabilistic component that resolves the tension between obtaining useful samples for fast learning and maintaining small queue lengths.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts (0.04)
- Asia > India > Rajasthan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.68)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > New Jersey > Essex County > Newark (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts (0.04)
- Asia > India > Rajasthan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Modeling & Simulation (0.67)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
Demonstration of effective UCB-based routing in skill-based queues on real-world data
van Kempen, Sanne, Sanders, Jaron, Sloothaak, Fiona, Wolf, Maarten G.
This paper is about optimally controlling skill-based queueing systems such as data centers, cloud computing networks, and service systems. By means of a case study using a real-world data set, we investigate the practical implementation of a recently developed reinforcement learning algorithm for optimal customer routing. Our experiments show that the algorithm efficiently learns and adapts to changing environments and outperforms static benchmark policies, indicating its potential for live implementation. We also augment the real-world applicability of this algorithm by introducing a new heuristic routing rule to reduce delays. Moreover, we show that the algorithm can optimize for multiple objectives: next to payoff maximization, secondary objectives such as server load fairness and customer waiting time reduction can be incorporated. Tuning parameters are used for balancing inherent performance trade--offs. Lastly, we investigate the sensitivity to estimation errors and parameter tuning, providing valuable insights for implementing adaptive routing algorithms in complex real-world queueing systems.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (2 more...)