AITopics | allocator

Collaborating Authors

allocator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

execution, library, pytorch, (17 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Massachusetts > Middlesex County > Natick (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

STAlloc: Enhancing Memory Efficiency in Large-Scale Model Training with Spatio-Temporal Planning

Huang, Zixiao, Hu, Junhao, Lin, Hao, Zhu, Chunyang, Tang, Yueran, Zhang, Quanlu, Guo, Zhen, Li, Zhenhua, Yan, Shengen, Zhu, Zhenhua, Dai, Guohao, Wang, Yu

arXiv.org Artificial IntelligenceNov-26-2025

The rapid scaling of large language models (LLMs) has significantly increased GPU memory pressure, which is further aggravated by training optimization techniques such as virtual pipeline and recomputation that disrupt tensor lifespans and introduce considerable memory fragmentation. Such fragmentation stems from the use of online GPU memory allocators in popular deep learning frameworks like PyTorch, which disregard tensor lifespans. As a result, this inefficiency can waste as much as 43% of memory and trigger out-of-memory errors, undermining the effectiveness of optimization methods. To address this, we introduce STAlloc, a GPU memory allocator for deep learning frameworks that reduces fragmentation by exploiting the spatial and temporal regularity in memory allocation behaviors of training workloads. STAlloc introduces a novel paradigm that combines offline planning with online allocation. The offline planning leverages spatio-temporal regularities to generate a near-optimal allocation plan, while the online allocation handles complex and dynamic models such as Mixture-of-Experts (MoE). Built as a pluggable PyTorch memory allocator, STAlloc reduces fragmentation ratio on average by 85.1% (up to 100%) across both dense and MoE models, with negligible overhead. This enables more efficient, high-throughput training configurations and improves throughput performance by up to 32.5%.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3767295.3769335

2507.16274

Country:

Asia > China (0.28)
Europe > United Kingdom > Scotland (0.16)

Genre:

Workflow (1.00)
Research Report (1.00)

Industry: Health & Medicine > Consumer Health (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Harli: SLO-Aware Co-location of LLM Inference and PEFT-based Finetuning on Model-as-a-Service Platforms

Xu, Ao, Zhao, Han, Cui, Weihao, Chen, Quan, Chen, Yukang, Zhang, Shulai, Chen, Shuang, Jiang, Jiemin, Yu, Zhibin, Guo, Minyi

arXiv.org Artificial IntelligenceNov-20-2025

Large language models (LLMs) are increasingly deployed under the Model-as-a-Service (MaaS) paradigm. To meet stringent quality-of-service (QoS) requirements, existing LLM serving systems disaggregate the prefill and decode phases of inference. However, decode instances often experience low GPU utilization due to their memory-bound nature and insufficient batching in dynamic workloads, leaving compute resources underutilized. We introduce Harli, a serving system that improves GPU utilization by co-locating parameter-efficient finetuning (PEFT) tasks with LLM decode instances. PEFT tasks are compute-bound and memory-efficient, making them ideal candidates for safe co-location. Specifically, Harli addresses key challenges--limited memory and unpredictable interference--using three components: a unified memory allocator for runtime memory reuse, a two-stage latency predictor for decode latency modeling, and a QoS-guaranteed throughput-maximizing scheduler for throughput maximization. Experimental results show that Harli improves the finetune throughput by 46.2% on average (up to 92.0%) over state-of-the-art serving systems, while maintaining strict QoS guarantees for inference decode.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.11729

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Agent Regime-Conditioned Diffusion (MARCD) for CVaR-Constrained Portfolio Decisions

Alzahrani, Ali Atiah

arXiv.org Artificial IntelligenceNov-4-2025

We examine whether regime-conditioned generative scenarios combined with a convex CVaR allocator improve portfolio decisions under regime shifts. We present MARCD, a generative-to-decision framework with: (i) a Gaussian HMM to infer latent regimes; (ii) a diffusion generator that produces regime-conditioned scenarios; (iii) signal extraction via blended, shrunk moments; and (iv) a governed CVaR epigraph quadratic program. Contributions: Within the Scenario stage we introduce a tail-weighted diffusion objective that up-weights low-quantile outcomes relevant for drawdowns and a regime-expert (MoE) denoiser whose gate increases with crisis posteriors; both are evaluated end-to-end through the allocator. Under strict walk-forward on liquid multi-asset ETFs (2005-2025), MARCD exhibits stronger scenario calibration and materially smaller drawdowns: MaxDD 9.3% versus 14.1% for BL (a 34% reduction) over 2020-2025 out-of-sample. The framework provides an auditable pipeline with explicit budget, box, and turnover constraints, demonstrating the value of decision-aware generative modeling in finance.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.10807

Genre: Research Report (0.82)

Industry: Banking & Finance (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

xMem: A CPU-Based Approach for Accurate Estimation of GPU Memory in Deep Learning Training Workloads

Shi, Jiabo, Pezaros, Dimitrios, Elkhatib, Yehia

arXiv.org Artificial IntelligenceOct-27-2025

The global scarcity of GPUs necessitates more sophisticated strategies for Deep Learning jobs in shared cluster environments. Accurate estimation of how much GPU memory a job will require is fundamental to enabling advanced scheduling and GPU sharing, which helps prevent out-of-memory (OOM) errors and resource underutilization. However, existing estimation methods have limitations. Approaches relying on static analysis or historical data with machine learning often fail to accurately capture runtime dynamics. Furthermore, direct GPU analysis consumes scarce resources, and some techniques require intrusive code modifications. Thus, the key challenge lies in precisely estimating dynamic memory requirements, including memory allocator nuances, without consuming GPU resources and non-intrusive code changes. To address this challenge, we propose xMem, a novel framework that leverages CPU-only dynamic analysis to accurately estimate peak GPU memory requirements a priori. We conducted a thorough evaluation of xMem against state-of-the-art solutions using workloads from 25 different models, including architectures like Convolutional Neural Networks and Transformers. The analysis of 5209 runs, which includes ANOVA and Monte Carlo results, highlights xMem's benefits: it decreases the median relative error by 91% and significantly reduces the probability of estimation failure as safe OOM thresholds by 75%, meaning that the estimated value can often be used directly without causing OOM. Ultimately, these improvements lead to a 368% increase in memory conservation potential over current solutions.

artificial intelligence, machine learning, xmem, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3721462.3770773

2510.21048

Country: North America > United States (0.95)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Morphlux: Transforming Torus Fabrics for Efficient Multi-tenant ML

Kumar, Abhishek Vijaya, Ding, Eric, Devraj, Arjun, Bunandar, Darius, Singh, Rachee

arXiv.org Artificial IntelligenceOct-6-2025

We develop Morphlux, a server-scale programmable photonic fabric to interconnect accelerators within servers. We show that augmenting state-of-the-art torus-based ML data-centers with Morphlux can improve the bandwidth of tenant compute allocations by up to 66%, reduce compute fragmentation by up to 70%, and minimize the blast radius of chip failures. We develop a novel end-to-end hardware prototype of Morphlux to demonstrate these performance benefits which translate to 1.72X improvement in training throughput of ML models. By rapidly programming the server-scale fabric in our hardware testbed, Morphlux can replace a failed accelerator chip with a healthy one in 1.2 seconds.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.03674

Country: North America > United States > New York (0.15)

Genre: Research Report (0.64)

Industry: Information Technology > Services (0.67)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare

Shi, Zhengliang, Ma, Ruotian, Huang, Jen-tse, Ma, Xinbei, Chen, Xingyu, Wang, Mengru, Yang, Qu, Wang, Yue, Ye, Fanghua, Chen, Ziyang, Wang, Shanyi, Li, Cixing, Wang, Wenxuan, Tu, Zhaopeng, Li, Xiaolong, Ren, Zhaochun, Linus, null

arXiv.org Artificial IntelligenceOct-2-2025

Large language models (LLMs) are increasingly entrusted with high-stakes decisions that affect human welfare. However, the principles and values that guide these models when distributing scarce societal resources remain largely unexamined. To address this, we introduce the Social Welfare Function (SWF) Benchmark, a dynamic simulation environment where an LLM acts as a sovereign allocator, distributing tasks to a heterogeneous community of recipients. The benchmark is designed to create a persistent trade-off between maximizing collective efficiency (measured by Return on Investment) and ensuring distributive fairness (measured by the Gini coefficient). We evaluate 20 state-of-the-art LLMs and present the first leaderboard for social welfare allocation. Our findings reveal three key insights: (i) A model's general conversational ability, as measured by popular leaderboards, is a poor predictor of its allocation skill. (ii) Most LLMs exhibit a strong default utilitarian orientation, prioritizing group productivity at the expense of severe inequality. (iii) Allocation strategies are highly vulnerable, easily perturbed by output-length constraints and social-influence framing. These results highlight the risks of deploying current LLMs as societal decision-makers and underscore the need for specialized benchmarks and targeted alignment for AI governance.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.01164

Country: Europe (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala

Neural Information Processing SystemsAug-20-2025, 00:53:12 GMT

Deep learning frameworks have often focused on either usability or speed, but not both.

execution, library, pytorch, (17 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Massachusetts > Middlesex County > Natick (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FLOPS: Forward Learning with OPtimal Sampling

Ren, Tao, Zhang, Zishi, Jiang, Jinyang, Li, Guanghao, Zhang, Zeliang, Feng, Mingqian, Peng, Yijie

arXiv.org Artificial IntelligenceOct-17-2024

Given the limitations of backpropagation, perturbation-based gradient computation methods have recently gained focus for learning with only forward passes, also referred to as queries. Conventional forward learning consumes enormous queries on each data point for accurate gradient estimation through Monte Carlo sampling, which hinders the scalability of those algorithms. However, not all data points deserve equal queries for gradient estimation. In this paper, we study the problem of improving the forward learning efficiency from a novel perspective: how to reduce the gradient estimation variance with minimum cost? For this, we propose to allocate the optimal number of queries over each data in one batch during training to achieve a good balance between estimation accuracy and computational efficiency. Specifically, with a simplified proxy objective and a reparameterization technique, we derive a novel plug-and-play query allocator with minimal parameters. Theoretical results are carried out to verify its optimality. We conduct extensive experiments for fine-tuning Vision Transformers on various datasets and further deploy the allocator to two black-box applications: prompt tuning and multimodal alignment for foundation models. All findings demonstrate that our proposed allocator significantly enhances the scalability of forward-learning algorithms, paving the way for real-world applications.

allocator, gradient, query, (13 more...)

arXiv.org Artificial Intelligence

2410.05966

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Nonlinear Model Predictive Control of Tiltrotor Quadrotors with Feasible Control Allocation

Shayan, Zeinab, Cristobal, Jann, Izadi, Mohammadreza, Yazdanshenas, Amin, Naderi, Mehdi, Faieghi, Reza

arXiv.org Artificial IntelligenceJun-21-2024

This paper presents a new flight control framework for tilt-rotor multirotor uncrewed aerial vehicles (MRUAVs). Tiltrotor designs offer full actuation but introduce complexity in control allocation due to actuator redundancy. We propose a new approach where the allocator is tightly coupled with the controller, ensuring that the control signals generated by the controller are feasible within the vehicle actuation space. We leverage nonlinear model predictive control (NMPC) to implement the above framework, providing feasible control signals and optimizing performance. This unified control structure simultaneously manages both position and attitude, which eliminates the need for cascaded position and attitude control loops. Extensive numerical experiments demonstrate that our approach significantly outperforms conventional techniques that are based on linear quadratic regulator (LQR) and sliding mode control (SMC), especially in high-acceleration trajectories and disturbance rejection scenarios, making the proposed approach a viable option for enhanced control precision and robustness, particularly in challenging missions.

artificial intelligence, optimization problem, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2406.0613

Country:

North America > Canada (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Air (1.00)
Aerospace & Defense > Aircraft (1.00)
Energy > Oil & Gas > Upstream (0.61)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback