scheduling
A Universal Load Balancing Principle and Its Application to Large Language Model Serving
Chen, Zixi, Bu, Tianci, Song, Chendong, Lu, Xin, Ye, Yinyu, Zhou, Zijie
Load balancing-the allocation of work across parallel resources to reduce delay, energy and cost-is a pervasive challenge in science and engineering, from large-scale simulation and data processing to cloud and manufacturing operations. Motivated by the emerging bottleneck in large language model (LLM) serving, we study a particularly stringent regime of load balancing that arises in barrier-synchronized, stateful systems: work cannot be freely migrated and progress is gated by the slowest participant at each step, so heterogeneity and temporal drift in workloads create persistent stragglers and substantial idle time. LLM serving under data-parallel decoding provides a prominent modern instance: in production traces, barrier-induced idle can exceed 40% of compute time per decode step. Here we develop a universal load-balancing principle, which admits a step-wise finite-horizon integer-optimization formulation and yields worst-case guarantees: across LLM decode models and a broader class of non-decreasing workload drift processes, it reduces long-run imbalance by a factor that grows with batch size and system scale. Extensive experiments corroborate the theory, showing substantial improvements in throughput and latency together with reductions in energy consumption. These results provide a general, theoretically grounded framework for load balancing, with immediate implications for sustainable LLM serving and broad relevance to other synchronization-gated resource-allocation problems.
- North America > United States > California > San Francisco County > San Francisco (0.27)
- North America > United States > New York (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (6 more...)
The Parameterized Complexity of Cascading Portfolio Scheduling
Cascading portfolio scheduling is a static algorithm selection strategy which uses a sample of test instances to compute an optimal ordering (a cascading schedule) of a portfolio of available algorithms. The algorithms are then applied to each future instance according to this cascading schedule, until some algorithm in the schedule succeeds. Cascading algorithm scheduling has proven to be effective in several applications, including QBF solving and the generation of ImageNet classification models. It is known that the computation of an optimal cascading schedule in the offline phase is NP-hard. In this paper we study the parameterized complexity of this problem and establish its fixed-parameter tractability by utilizing structural properties of the success relation between algorithms and test instances. Our findings are significant as they reveal that in spite of the intractability of the problem in its general form, one can indeed exploit sparseness or density of the success relation to obtain non-trivial runtime guarantees for finding an optimal cascading schedule.
Recurrent Submodular Welfare and Matroid Blocking Semi-Bandits
A recent line of research focuses on the study of stochastic multi-armed bandits (MAB), in the case where temporal correlations of specific structure are imposed between the player's actions and the reward distributions of the arms. These correlations lead to (sub-)optimal solutions that exhibit interesting dynamical patterns -- a phenomenon that yields new challenges both from an algorithmic as well as a learning perspective. In this work, we extend the above direction to a combinatorial semi-bandit setting and study a variant of stochastic MAB, where arms are subject to matroid constraints and each arm becomes unavailable (blocked) for a fixed number of rounds after each play. A natural common generalization of the state-of-the-art for blocking bandits, and that for matroid bandits, only guarantees a $\frac{1}{2}$-approximation for general matroids. In this paper we develop the novel technique of correlated (interleaved) scheduling, which allows us to obtain a polynomial-time $(1 - \frac{1}{e})$-approximation algorithm (asymptotically and in expectation) for any matroid. Along the way, we discover an interesting connection to a variant of Submodular Welfare Maximization, for which we provide (asymptotically) matching upper and lower approximability bounds. In the case where the mean arm rewards are unknown, our technique naturally decouples the scheduling from the learning problem, and thus allows to control the $(1-\frac{1}{e})$-approximate regret of a UCB-based adaptation of our online algorithm.
LLM-Upgraded Graph Reinforcement Learning for Carbon-Aware Job Scheduling in Smart Manufacturing
Yang, Zhiying, Liu, Fang, Zhang, Wei, Lou, Xin, Low, Malcolm Yoke Hean, Gan, Boon Ping
This paper presents \textsc{Luca}, a \underline{l}arge language model (LLM)-\underline{u}pgraded graph reinforcement learning framework for \underline{c}arbon-\underline{a}ware flexible job shop scheduling. \textsc{Luca} addresses the challenges of dynamic and sustainable scheduling in smart manufacturing systems by integrating a graph neural network and an LLM, guided by a carefully designed in-house prompting strategy, to produce a fused embedding that captures both structural characteristics and contextual semantics of the latest scheduling state. This expressive embedding is then processed by a deep reinforcement learning policy network, which generates real-time scheduling decisions optimized for both makespan and carbon emission objectives. To support sustainability goals, \textsc{Luca} incorporates a dual-objective reward function that encourages both energy efficiency and scheduling timeliness. Experimental results on both synthetic and public datasets demonstrate that \textsc{Luca} consistently outperforms comparison algorithms. For instance, on the synthetic dataset, it achieves an average of 4.1\% and up to 12.2\% lower makespan compared to the best-performing comparison algorithm while maintaining the same emission level. On public datasets, additional gains are observed for both makespan and emission. These results demonstrate that \textsc{Luca} is effective and practical for carbon-aware scheduling in smart manufacturing.
- Asia > Singapore (0.06)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China (0.04)
Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty
Liu, Kailiang, Chen, Ying, Borndörfer, Ralf, Koch, Thorsten
Intraday surgical scheduling is a multi-objective decision problem under uncertainty-balancing elective throughput, urgent and emergency demand, delays, sequence-dependent setups, and overtime. We formulate the problem as a cooperative Markov game and propose a multi-agent reinforcement learning (MARL) framework in which each operating room (OR) is an agent trained with centralized training and decentralized execution. All agents share a policy trained via Proximal Policy Optimization (PPO), which maps rich system states to actions, while a within-epoch sequential assignment protocol constructs conflict-free joint schedules across ORs. A mixed-integer pre-schedule provides reference starting times for electives; we impose type-specific quadratic delay penalties relative to these references and a terminal overtime penalty, yielding a single reward that captures throughput, timeliness, and staff workload. In simulations reflecting a realistic hospital mix (six ORs, eight surgery types, random urgent and emergency arrivals), the learned policy outperforms six rule-based heuristics across seven metrics and three evaluation subsets, and, relative to an ex post MIP oracle, quantifies optimality gaps. Policy analytics reveal interpretable behavior-prioritizing emergencies, batching similar cases to reduce setups, and deferring lower-value electives. We also derive a suboptimality bound for the sequential decomposition under simplifying assumptions. We discuss limitations-including OR homogeneity and the omission of explicit staffing constraints-and outline extensions. Overall, the approach offers a practical, interpretable, and tunable data-driven complement to optimization for real-time OR scheduling.
MemOS: A Memory OS for AI System
Li, Zhiyu, Xi, Chenyang, Li, Chunyu, Chen, Ding, Chen, Boyu, Song, Shichao, Niu, Simin, Wang, Hanyu, Yang, Jiawei, Tang, Chen, Yu, Qingchen, Zhao, Jihao, Wang, Yezhaohui, Liu, Peng, Lin, Zehao, Wang, Pengyuan, Huo, Jiahao, Chen, Tianyi, Chen, Kai, Li, Kehang, Tao, Zhen, Lai, Huayi, Wu, Hao, Tang, Bo, Wang, Zhengren, Fan, Zhaoxin, Zhang, Ningyu, Zhang, Linfeng, Yan, Junchi, Yang, Mingchuan, Xu, Tong, Xu, Wei, Chen, Huajun, Wang, Haofen, Yang, Hongkang, Zhang, Wentao, Xu, Zhi-Qin John, Chen, Siheng, Xiong, Feiyu
Large Language Models (LLMs) have become an essential infrastructure for Artificial General Intelligence (AGI), yet their lack of well-defined memory management systems hinders the development of long-context reasoning, continual personalization, and knowledge consistency.Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user preferences or update knowledge over extended periods.While Retrieval-Augmented Generation (RAG) introduces external knowledge in plain text, it remains a stateless workaround without lifecycle control or integration with persistent representations.Recent work has modeled the training and inference cost of LLMs from a memory hierarchy perspective, showing that introducing an explicit memory layer between parameter memory and external retrieval can substantially reduce these costs by externalizing specific knowledge. Beyond computational efficiency, LLMs face broader challenges arising from how information is distributed over time and context, requiring systems capable of managing heterogeneous knowledge spanning different temporal scales and sources. To address this challenge, we propose MemOS, a memory operating system that treats memory as a manageable system resource. It unifies the representation, scheduling, and evolution of plaintext, activation-based, and parameter-level memories, enabling cost-efficient storage and retrieval. As the basic unit, a MemCube encapsulates both memory content and metadata such as provenance and versioning. MemCubes can be composed, migrated, and fused over time, enabling flexible transitions between memory types and bridging retrieval with parameter-based learning. MemOS establishes a memory-centric system framework that brings controllability, plasticity, and evolvability to LLMs, laying the foundation for continual learning and personalized modeling.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- (11 more...)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.93)
- Law (0.93)
- Government (0.67)
Finite-Time Minimax Bounds and an Optimal Lyapunov Policy in Queueing Control
Liu, Yujie, Tan, Vincent Y. F., Xu, Yunbei
We introduce an original minimax framework for finite-time performance analysis in queueing control and propose a surprisingly simple Lyapunov-based scheduling policy with superior finite-time performance. The framework quantitatively characterizes how the expected total queue length scales with key system parameters, including the capacity of the scheduling set and the variability of arrivals and departures across queues. To our knowledge, this provides the first firm foundation for evaluating and comparing scheduling policies in the finite-time regime, including nonstationary settings, and shows that the proposed policy can provably and empirically outperform classical MaxWeight in finite time. Within this framework, we establish three main sets of results. First, we derive minimax lower bounds on the expected total queue length for parallel-queue scheduling via a novel Brownian coupling argument. Second, we propose a new policy, LyapOpt, which minimizes the full quadratic Lyapunov drift-capturing both first- and second-order terms-and achieves optimal finite-time performance in heavy traffic while retaining classical stability guarantees. Third, we identify a key limitation of the classical MaxWeight policy, which optimizes only the first-order drift: its finite-time performance depends suboptimally on system parameters, leading to substantially larger backlogs in explicitly characterized settings. Together, these results delineate the scope and limitations of classical drift-based scheduling and motivate new queueing-control methods with rigorous finite-time guarantees.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Kansas > Russell County (0.04)
- Asia > Singapore (0.04)
Optimized scheduling of electricity-heat cooperative system considering wind energy consumption and peak shaving and valley filling
Ye, Jin, Wang, Lingmei, Zhang, Shujian, Wu, Haihang
With the global energy transition and rapid development of renewable energy, the scheduling optimization challenge for combined power-heat systems under new energy integration and multiple uncertainties has become increasingly prominent. Addressing this challenge, this study proposes an intelligent scheduling method based on the improved Dual-Delay Deep Deterministic Policy Gradient (PVTD3) algorithm. System optimization is achieved by introducing a penalty term for grid power purchase variations. Simulation results demonstrate that under three typical scenarios (10%, 20%, and 30% renewable penetration), the PVTD3 algorithm reduces the system's comprehensive cost by 6.93%, 12.68%, and 13.59% respectively compared to the traditional TD3 algorithm. Concurrently, it reduces the average fluctuation amplitude of grid power purchases by 12.8%. Regarding energy storage management, the PVTD3 algorithm reduces the end-time state values of low-temperature thermal storage tanks by 7.67-17.67 units while maintaining high-temperature tanks within the 3.59-4.25 safety operating range. Multi-scenario comparative validation demonstrates that the proposed algorithm not only excels in economic efficiency and grid stability but also exhibits superior sustainable scheduling capabilities in energy storage device management.
- Asia > China > Shanxi Province > Taiyuan (0.05)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Energy > Power Industry (1.00)
- Energy > Renewable > Wind (0.85)
- Energy > Renewable > Solar (0.70)
- Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities (0.36)
Adam Simplified: Bias Correction Debunked
The Adam optimizer is a cornerstone of modern deep learning, yet the empirical necessity of each of its individual components is often taken for granted. This paper presents a focused investigation into the role of bias-correction, a feature whose contribution remains poorly understood. Through a series of systematic ablations on vision and language modelling tasks, we demonstrate that the conventional wisdom surrounding bias correction is misleading. In particular, we demonstrate that in the optimal hyper-parameter configuration, the inclusion of bias correction leads to no improvement in final test performance. Moreover, unless appropriate learning rate scheduling is implemented, the inclusion of bias correction can sometimes be detrimental to performance. We further reinterpret bias correction as a form of implicit learning rate scheduling whose behaviour is strongly dependent on the choice of smoothing hyper-parameters $β_1, β_2 \in [0,1)$. Our findings challenge the universal inclusion of this component.