admission control
Prioritizing Latency with Profit: A DRL-Based Admission Control for 5G Network Slices
Chakraborty, Proggya, Asrar, Aaquib, Sengupta, Jayasree, Bit, Sipra Das
5G networks enable diverse services such as eMBB, URLLC, and mMTC through network slicing, necessitating intelligent admission control and resource allocation to meet stringent QoS requirements while maximizing Network Service Provider (NSP) profits. However, existing Deep Reinforcement Learning (DRL) frameworks focus primarily on profit optimization without explicitly accounting for service delay, potentially leading to QoS violations for latency-sensitive slices. Moreover, commonly used epsilon-greedy exploration of DRL often results in unstable convergence and suboptimal policy learning. To address these gaps, we propose DePSAC -- a Delay and Profit-aware Slice Admission Control scheme. Our DRL-based approach incorporates a delay-aware reward function, where penalties due to service delay incentivize the prioritization of latency-critical slices such as URLLC. Additionally, we employ Boltzmann exploration to achieve smoother and faster convergence. We implement and evaluate DePSAC on a simulated 5G core network substrate with realistic Network Slice Request (NSLR) arrival patterns. Experimental results demonstrate that our method outperforms the DSARA baseline in terms of overall profit, reduced URLLC slice delays, improved acceptance rates, and improved resource consumption. These findings validate the effectiveness of the proposed DePSAC in achieving better QoS-profit trade-offs for practical 5G network slicing scenarios.
SLOs-Serve: Optimized Serving of Multi-SLO LLMs
Chen, Siyuan, Jia, Zhipeng, Khan, Samira, Krishnamurthy, Arvind, Gibbons, Phillip B.
This paper introduces SLOs-Serve, a system designed for serving multi-stage large language model (LLM) requests with application- and stage-specific service level objectives (SLOs). The key idea behind SLOs-Serve is to customize the allocation of tokens to meet these SLO requirements. SLOs-Serve uses a multi-SLO dynamic programming-based algorithm to continuously optimize token allocations under SLO constraints by exploring the full design space of chunked prefill and (optional) speculative decoding. Leveraging this resource planning algorithm, SLOs-Serve effectively supports multi-SLOs and multi-replica serving with dynamic request routing while being resilient to bursty arrivals. Our evaluation across 6 LLM application scenarios (including summarization, coding, chatbot, tool calling, and reasoning) demonstrates that SLOs-Serve improves per-GPU serving capacity by 2.2x on average compared to prior state-of-the-art systems.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Texas > Travis County > Austin (0.04)
- (4 more...)
Reinforcement Learning and Regret Bounds for Admission Control
Weber, Lucas, Bušić, Ana, Zhu, Jiamin
The expected regret of any reinforcement learning algorithm is lower bounded by $\Omega\left(\sqrt{DXAT}\right)$ for undiscounted returns, where $D$ is the diameter of the Markov decision process, $X$ the size of the state space, $A$ the size of the action space and $T$ the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific knowledge of the problem structure. In this article, we consider an admission control problem to an $M/M/c/S$ queue with $m$ job classes and class-dependent rewards and holding costs. Queuing systems often have a diameter that is exponential in the buffer size $S$, making the previous lower bound prohibitive for any practical use. We propose an algorithm inspired by UCRL2, and use the structure of the problem to upper bound the expected total regret by $O(S\log T + \sqrt{mT \log T})$ in the finite server case. In the infinite server case, we prove that the dependence of the regret on $S$ disappears.
- Europe > Austria > Vienna (0.14)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (2 more...)
Reliability-Optimized User Admission Control for URLLC Traffic: A Neural Contextual Bandit Approach
Semiari, Omid, Nikopour, Hosein, Talwar, Shilpa
Ultra-reliable low-latency communication (URLLC) is the cornerstone for a broad range of emerging services in next-generation wireless networks. URLLC fundamentally relies on the network's ability to proactively determine whether sufficient resources are available to support the URLLC traffic, and thus, prevent so-called cell overloads. Nonetheless, achieving accurate quality-of-service (QoS) predictions for URLLC user equipment (UEs) and preventing cell overloads are very challenging tasks. This is due to dependency of the QoS metrics (latency and reliability) on traffic and channel statistics, users' mobility, and interdependent performance across UEs. In this paper, a new QoS-aware UE admission control approach is developed to proactively estimate QoS for URLLC UEs, prior to associating them with a cell, and accordingly, admit only a subset of UEs that do not lead to a cell overload. To this end, an optimization problem is formulated to find an efficient UE admission control policy, cognizant of UEs' QoS requirements and cell-level load dynamics. To solve this problem, a new machine learning based method is proposed that builds on (deep) neural contextual bandits, a suitable framework for dealing with nonlinear bandit problems. In fact, the UE admission controller is treated as a bandit agent that observes a set of network measurements (context) and makes admission control decisions based on context-dependent QoS (reward) predictions. The simulation results show that the proposed scheme can achieve near-optimal performance and yield substantial gains in terms of cell-level service reliability and efficient resource utilization.
- Telecommunications (0.68)
- Information Technology (0.46)
Digital Twin Assisted Deep Reinforcement Learning for Online Admission Control in Sliced Network
Tao, Zhenyu, Xu, Wei, You, Xiaohu
The proliferation of diverse wireless services in 5G and beyond has led to the emergence of network slicing technologies. Among these, admission control plays a crucial role in achieving service-oriented optimization goals through the selective acceptance of service requests. Although deep reinforcement learning (DRL) forms the foundation in many admission control approaches thanks to its effectiveness and flexibility, initial instability with excessive convergence delay of DRL models hinders their deployment in real-world networks. We propose a digital twin (DT) accelerated DRL solution to address this issue. Specifically, we first formulate the admission decision-making process as a semi-Markov decision process, which is subsequently simplified into an equivalent discrete-time Markov decision process to facilitate the implementation of DRL methods. A neural network-based DT is established with a customized output layer for queuing systems, trained through supervised learning, and then employed to assist the training phase of the DRL model. Extensive simulations show that the DT-accelerated DRL improves resource utilization by over 40% compared to the directly trained state-of-the-art dueling deep Q-learning model. This improvement is achieved while preserving the model's capability to optimize the long-term rewards of the admission process.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (4 more...)
- Telecommunications (1.00)
- Information Technology > Networks (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)
Neural Network Implementation of Admission Control
A feedforward layered network implements a mapping required to control an unknown stochastic nonlinear dynamical system. Training is based on a novel approach that combines stochastic approximation ideas with back(cid:173) propagation. The method is applied to control admission into a queueing sys(cid:173) tem operating in a time-varying environment.
Decomposition of Reinforcement Learning for Admission Control of Self-Similar Call Arrival Processes
This paper presents predictive gain scheduling, a technique for simplify(cid:173) ing reinforcement learning problems by decomposition. Link admission control of self-similar call traffic is used to demonstrate the technique. The control problem is decomposed into on-line prediction of near-fu(cid:173) ture call arrival rates, and precomputation of policies for Poisson call ar(cid:173) rival processes. At decision time, the predictions are used to select among the policies. Simulations show that this technique results in sig(cid:173) nificantly faster learning without any performance loss, compared to a reinforcement learning controller that does not decompose the problem.
Measurement-based Admission Control in Sliced Networks: A Best Arm Identification Approach
Lindståhl, Simon, Proutiere, Alexandre, Johnsson, Andreas
In sliced networks, the shared tenancy of slices requires adaptive admission control of data flows, based on measurements of network resources. In this paper, we investigate the design of measurement-based admission control schemes, deciding whether a new data flow can be admitted and in this case, on which slice. The objective is to devise a joint measurement and decision strategy that returns a correct decision (e.g., the least loaded slice) with a certain level of confidence while minimizing the measurement cost (the number of measurements made before committing to the decision). We study the design of such strategies for several natural admission criteria specifying what a correct decision is. For each of these criteria, using tools from best arm identification in bandits, we first derive an explicit information-theoretical lower bound on the cost of any algorithm returning the correct decision with fixed confidence. We then devise a joint measurement and decision strategy achieving this theoretical limit. We compare empirically the measurement costs of these strategies, and compare them both to the lower bounds as well as a naive measurement scheme. We find that our algorithm significantly outperforms the naive scheme (by a factor $2-8$).
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Communications > Networks (0.93)
Average Reward Adjusted Discounted Reinforcement Learning: Near-Blackwell-Optimal Policies for Real-World Applications
Although in recent years reinforcement learning has become very popular the number of successful applications to different kinds of operations research problems is rather scarce. Reinforcement learning is based on the well-studied dynamic programming technique and thus also aims at finding the best stationary policy for a given Markov Decision Process, but in contrast does not require any model knowledge. The policy is assessed solely on consecutive states (or state-action pairs), which are observed while an agent explores the solution space. The contributions of this paper are manifold. First we provide deep theoretical insights to the widely applied standard discounted reinforcement learning framework, which give rise to the understanding of why these algorithms are inappropriate when permanently provided with non-zero rewards, such as costs or profit. Second, we establish a novel near-Blackwell-optimal reinforcement learning algorithm. In contrary to former method it assesses the average reward per step separately and thus prevents the incautious combination of different types of state values. Thereby, the Laurent Series expansion of the discounted state values forms the foundation for this development and also provides the connection between the two approaches. Finally, we prove the viability of our algorithm on a challenging problem set, which includes a well-studied M/M/1 admission control queuing system. In contrast to standard discounted reinforcement learning our algorithm infers the optimal policy on all tested problems. The insights are that in the operations research domain machine learning techniques have to be adapted and advanced to successfully apply these methods in our settings.
- Europe > Austria > Tyrol > Innsbruck (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- Transportation (0.68)
- Leisure & Entertainment > Games (0.46)
Adaptive Measurement-Based Policy-Driven QoS Management with Fuzzy-Rule-based Resource Allocation
Yerima, Suleiman Y., Parr, Gerard P., McClean, Sally I., Morrow, Philip J.
Fixed and wireless networks are increasingly converging towards common connectivity with IP-based core networks. Providing effective end-to-end resource and QoS management in such complex heterogeneous converged network scenarios requires unified, adaptive and scalable solutions to integrate and co-ordinate diverse QoS mechanisms of different access technologies with IP-based QoS. Policy-Based Network Management (PBNM) is one approach that could be employed to address this challenge. Hence, a policy-based framework for end-to-end QoS management in converged networks, CNQF (Converged Networks QoS Management Framework) has been proposed within our project. In this paper, the CNQF architecture, a Java implementation of its prototype and experimental validation of key elements are discussed. We then present a fuzzy-based CNQF resource management approach and study the performance of our implementation with real traffic flows on an experimental testbed. The results demonstrate the efficacy of our resource-adaptive approach for practical PBNM systems.
- North America > Canada (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- (10 more...)
- Telecommunications > Networks (0.93)
- Information Technology > Networks (0.93)
- Transportation (0.68)
- Information Technology > Software (1.00)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)