AITopics | budget allocation

Collaborating Authors

budget allocation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning

Qiu, Junnan, Zhao, Yuanjie, Li, Jie

arXiv.org Artificial IntelligenceDec-11-2025

Offline Reinforcement Learning (RL) enables policy optimization from static datasets but is inherently vulnerable to data poisoning attacks. Existing attack strategies typically rely on locally uniform perturbations, which treat all samples indiscriminately. This approach is inefficient, as it wastes the perturbation budget on low-impact samples, and lacks stealthiness due to significant statistical deviations. In this paper, we propose a novel Global Budget Allocation attack strategy. Leveraging the theoretical insight that a sample's influence on value function convergence is proportional to its Temporal Difference (TD) error, we formulate the attack as a global resource allocation problem. We derive a closed-form solution where perturbation magnitudes are assigned proportional to the TD-error sensitivity under a global L2 constraint. Empirical results on D4RL benchmarks demonstrate that our method significantly outperforms baseline strategies, achieving up to 80% performance degradation with minimal perturbations that evade detection by state-of-the-art statistical and spectral defenses.

artificial intelligence, machine learning, perturbation, (11 more...)

arXiv.org Artificial Intelligence

2512.08485

Country: Asia > China (0.16)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (0.88)
Government > Military (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

Wang, Fali, Liu, Hui, Dai, Zhenwei, Zeng, Jingying, Zhang, Zhiwei, Wu, Zongyu, Luo, Chen, Li, Zhen, Tang, Xianfeng, He, Qi, Wang, Suhang

arXiv.org Artificial IntelligenceOct-23-2025

Test-time scaling (TTS) enhances the performance of large language models (LLMs) by allocating additional compute resources during inference. However, existing research primarily investigates TTS in single-stage tasks; while many real-world problems are multi-stage complex tasks, composed of a sequence of heterogeneous subtasks with each subtask requires LLM of specific capability. Therefore, we study a novel problem: the test-time compute-optimal scaling in multi-stage complex tasks, aiming to select suitable models and allocate budgets per subtask to maximize overall performance. TTS in multi-stage tasks introduces two fundamental challenges: (i) The combinatorial search space of model and budget allocations, combined with the high cost of inference, makes brute-force search impractical. (ii) The optimal model and budget allocations across subtasks are interdependent, increasing the complexity of the compute-optimal search. To address this gap, we conduct extensive pilot experiments on four tasks across six datasets, deriving three empirical insights characterizing the behavior of LLMs in multi-stage complex tasks. Informed by these insights, we propose AgentTTS, an LLM-agent-based framework that autonomously searches for compute-optimal allocations through iterative feedback-driven interactions with the execution environment. Experimental results demonstrate that AgentTTS significantly outperforms traditional and other LLM-based baselines in search efficiency, and shows improved robustness to varying training set sizes and enhanced interpretability.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.0089

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry:

Information Technology (0.46)
Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Adaptive Budget Allocation for Orthogonal-Subspace Adapter Tuning in LLMs Continual Learning

Wan, Zhiyi, Du, Wanrou, Li, Liang, Pan, Miao, Qin, Xiaoqi

arXiv.org Artificial IntelligenceOct-17-2025

Large language models (LLMs) often suffer from catastrophic forgetting in continual learning (CL) scenarios, where performance on previously learned tasks degrades severely while training on sequentially arriving tasks. Although pioneering CL approaches using orthogonal subspaces can mitigate task interference, they typically employ fixed budget allocation, neglecting the varying complexity across tasks and layers. Besides, recent budget-adaptive tuning methods for LLMs often adopt multi-stage paradigms that decouple optimization and budget allocation. Such decoupling results in potential misalignment, which hinders those approaches' practical application in CL scenarios. To address these limitations, we propose OA-Adapter, a novel parameter-efficient approach for continual learning in LLMs that unifies dynamic budget adaptation with orthogonal subspace learning in an end-to-end training stage. Specifically, OA-Adapter introduces a dynamic bottleneck dimension adaptation mechanism that simultaneously allocates an efficient parameter budget and optimizes task objectives without misalignment.To effectively preserve previously acquired knowledge while coordinating with the dynamic budget allocation, orthogonal constraints are applied specifically between the parameter subspace of the current task and the dynamically allocated parameter subspaces of historical tasks. Experimental results on continual learning benchmarks demonstrate that OA-Adapter outperforms state-of-the-art methods in both accuracy and parameter efficiency. OA-Adapter achieves higher average accuracy while using 58.5% fewer parameters on the standard CL benchmark, and maintains its advantages on two larger benchmarks comprising 15 tasks.

continual learning, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.22358

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Mask Tokens as Prophet: Fine-Grained Cache Eviction for Efficient dLLM Inference

Huang, Jianuo, Zhang, Yaojie, Yang, Yicun, Huang, Benhao, Qi, Biqing, Liu, Dongrui, Zhang, Linfeng

arXiv.org Artificial IntelligenceOct-13-2025

Diffusion large language models (dLLMs) present a promising alternative to dominant autoregressive models (ARMs) by the ability of parallel decoding at the expense of substantial computation and memory costs. Specifically, the cache mechanism for bidirectional attention in dLLMs demands large memory footprint, restricting their ability to handle long contexts under resource-limited settings. Existing cache eviction strategies are designed for ARMs and ignore the unique characteristics of dLLMs, thus leading to unsatisfactory performance. To address these challenges, we introduce MaskKV, a training-free cache eviction framework tailored to dLLMs, focusing on the effect of mask tokens in dLLMs. MaskKV is built on two key innovations: (1) a mask-query guided scoring mechanism that leverages attention weights to identify and evict less critical prompt tokens for each head; (2) an adaptive cache budgeting strategy that improves efficiency by reducing allocation in intermediate layers and concentrating resources on prompt-preferring heads. On LLaDA with MaskKV, compressing the KV cache to only 256 pairs (less than 5% of tokens) retains 94% of the full-cache performance on LongBench and achieves up to 31x acceleration at 32k prompt length. The code is publicly available at: https://github.com/jianuo-huang/MaskKV

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.09309

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.36)

Add feedback

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Li, Ziniu, Chen, Congliang, Yang, Tianyun, Ding, Tian, Sun, Ruoyu, Zhang, Ge, Huang, Wenhao, Luo, Zhi-Quan

arXiv.org Artificial IntelligenceOct-1-2025

Large Language Models (LLMs) can self-improve through reinforcement learning, where they generate trajectories to explore and discover better solutions. However, this exploration process is computationally expensive, often forcing current methods to assign limited exploration budgets to each task. This uniform allocation creates problematic edge cases: easy tasks consistently succeed while difficult tasks consistently fail, both producing zero gradients during training updates for the widely used Group Relative Policy Optimization (GRPO). We address this problem from the lens of exploration budget allocation. Viewing each task's exploration as an "item" with a distinct "value" and "cost", we establish a connection to the classical knapsack problem. This formulation allows us to derive an optimal assignment rule that adaptively distributes resources based on the model's current learning status. When applied to GRPO, our method increases the effective ratio of non-zero policy gradients by 20-40% during training. Acting as a computational "free lunch", our approach could reallocate exploration budgets from tasks where learning is saturated to those where it is most impactful. This enables significantly larger budgets (e.g., 93 rollouts) for especially challenging problems, which would be computationally prohibitive under a uniform allocation. These improvements translate to meaningful gains on mathematical reasoning benchmarks, with average improvements of 2-4 points and peak gains of 9 points on specific tasks. Notably, achieving comparable performance with traditional homogeneous allocation would require about 2x the computational resources.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2509.25849

Country: Asia (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search

Cui, Yingqian, Dai, Zhenwei, He, Pengfei, He, Bing, Liu, Hui, Tang, Xianfeng, Zeng, Jingying, Wang, Suhang, Xing, Yue, Tang, Jiliang, Dumoulin, Benoit

arXiv.org Artificial IntelligenceOct-1-2025

Large Language Models (LLMs) have achieved significant advances in reasoning tasks. A key approach is tree-based search with verifiers, which expand candidate reasoning paths and use reward models to guide pruning and selection. Although effective in improving accuracy, these methods are not optimal in terms of efficiency: they perform simple decomposition on the reasoning process, but ignore the planning-execution nature of tasks such as math reasoning or code generation. This results in inefficient exploration of reasoning process. To address this, we propose a dual-phase test-time scaling framework that explicitly separates reasoning into planning and execution, and performs search over the two phases individually. Specifically, we decompose reasoning trajectories and develop reward models for each phase, enabling the search to explore and prune plans and executions separately. We further introduce a dynamic budget allocation mechanism that adaptively redistributes sampling effort based on reward feedback, allowing early stopping on confident steps and reallocation of computation to more challenging parts of the reasoning process. Experiments on both mathematical reasoning and code generation benchmarks demonstrate that our approach consistently improves accuracy while reducing redundant computation.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.2542

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Contextual Budget Bandit for Food Rescue Volunteer Engagement

Tang, Ariana, Raman, Naveen, Fang, Fei, Shi, Zheyuan Ryan

arXiv.org Artificial IntelligenceSep-16-2025

Volunteer-based food rescue platforms tackle food waste by matching surplus food to communities in need. These platforms face the dual problem of maintaining volunteer engagement and maximizing the food rescued. Existing algorithms to improve volunteer engagement exacerbate geographical disparities, leaving some communities systematically disadvantaged. We address this issue by proposing Contextual Budget Bandit. Contextual Budget Bandit incorporates context-dependent budget allocation in restless multi-armed bandits, a model of decision-making which allows for stateful arms. By doing so, we can allocate higher budgets to communities with lower match rates, thereby alleviating geographical disparities. To tackle this problem, we develop an empirically fast heuristic algorithm. Because the heuristic algorithm can achieve a poor approximation when active volunteers are scarce, we design the Mitosis algorithm, which is guaranteed to compute the optimal budget allocation. Empirically, we demonstrate that our algorithms outperform baselines on both synthetic and real-world food rescue datasets, and show how our algorithm achieves geographical fairness in food rescue.

artificial intelligence, contextual budget bandit, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2509.10777

Country: North America > United States (0.46)

Genre: Research Report (0.81)

Industry: Food & Agriculture > Agriculture (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.95)

Add feedback

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation

Shen, Yiqun, Yuan, Song, Zhang, Zhengze, Wang, Xiaoliang, Jiang, Daxin, Cam-Tu, Nguyen

arXiv.org Artificial IntelligenceSep-15-2025

KV Cache is commonly used to accelerate LLM inference with long contexts, yet its high memory demand drives the need for cache compression. Existing compression methods, however, are largely heuristic and lack dynamic budget allocation. To address this limitation, we introduce a unified framework for cache compression by minimizing information loss in Transformer residual streams. Building on it, we analyze the layer attention output loss and derive a new metric to compare cache entries across heads, enabling layer-wise compression with dynamic head budgets. Additionally, by contrasting cross-layer information, we also achieve dynamic layer budgets. LAVa is the first unified strategy for cache eviction and dynamic budget allocation that, unlike prior methods, does not rely on training or the combination of multiple strategies. Experiments with benchmarks (LongBench, Needle-In-A-Haystack, Ruler, and InfiniteBench) demonstrate its superiority. Moreover, our experiments reveal a new insight: dynamic layer budgets are crucial for generation tasks (e.g., code completion), while dynamic head budgets play a key role in extraction tasks (e.g., extractive QA). As a fully dynamic compression method, LAVa consistently maintains top performance across task types. Our code is available at https://github.com/MGDDestiny/Lava.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.09754

Country: North America > United States (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

EvolKV: Evolutionary KV Cache Compression for LLM Inference

Yu, Bohan, Chai, Yekun

arXiv.org Artificial IntelligenceSep-11-2025

Existing key-value (KV) cache compression methods typically rely on heuristics, such as uniform cache allocation across layers or static eviction policies, however, they ignore the critical interplays among layer-specific feature patterns and task performance, which can lead to degraded generalization. In this paper, we propose EvolKV, an adaptive framework for layer-wise, task-driven KV cache compression that jointly optimizes the memory efficiency and task performance. By reformulating cache allocation as a multi-objective optimization problem, EvolKV leverages evolutionary search to dynamically configure layer budgets while directly maximizing downstream performance. Extensive experiments on 11 tasks demonstrate that our approach outperforms all baseline methods across a wide range of KV cache budgets on long-context tasks and surpasses heuristic baselines by up to 7 percentage points on GSM8K. Notably, EvolKV achieves superior performance over the full KV cache setting on code completion while utilizing only 1.5% of the original budget, suggesting the untapped potential in learned compression strategies for KV cache budget allocation.

evolutionary algorithm, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2509.08315

Country:

Asia (0.67)
Europe (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Additive Distributionally Robust Ranking and Selection

Li, Zaile, Wan, Yuchen, Hong, L. Jeff

arXiv.org Machine LearningSep-9-2025

Ranking and selection (R&S) aims to identify the alternative with the best mean performance among $k$ simulated alternatives. The practical value of R&S depends on accurate simulation input modeling, which often suffers from the curse of input uncertainty due to limited data. Distributionally robust ranking and selection (DRR&S) addresses this challenge by modeling input uncertainty via an ambiguity set of $m > 1$ plausible input distributions, resulting in $km$ scenarios in total. Recent DRR&S studies suggest a key structural insight: additivity in budget allocation is essential for efficiency. However, existing justifications are heuristic, and fundamental properties such as consistency and the precise allocation pattern induced by additivity remain poorly understood. In this paper, we propose a simple additive allocation (AA) procedure that aims to exclusively sample the $k + m - 1$ previously hypothesized critical scenarios. Leveraging boundary-crossing arguments, we establish a lower bound on the probability of correct selection and characterize the procedure's budget allocation behavior. We then prove that AA is consistent and, surprisingly, achieves additivity in the strongest sense: as the total budget increases, only $k + m - 1$ scenarios are sampled infinitely often. Notably, the worst-case scenarios of non-best alternatives may not be among them, challenging prior beliefs about their criticality. These results offer new and counterintuitive insights into the additive structure of DRR&S. To improve practical performance while preserving this structure, we introduce a general additive allocation (GAA) framework that flexibly incorporates sampling rules from traditional R&S procedures in a modular fashion. Numerical experiments support our theoretical findings and demonstrate the competitive performance of the proposed GAA procedures.

aa procedure, procedure, scenario, (13 more...)

arXiv.org Machine Learning

2509.06147

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.27)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > France (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.45)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback