AITopics | cost budget

Collaborating Authors

cost budget

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs

Neural Information Processing SystemsJun-15-2026, 10:41:21 GMT

constraint, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Industry: Education > Curriculum (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

Let the LLM Stick to Its Strengths: Learning to Route Economical LLM

Neural Information Processing SystemsJun-11-2026, 11:22:11 GMT

Recently, test-time scaling of Large Language Models (LLMs) has emerged as a practical alternative to parameter and data scaling. Reasoning tasks often require large-scale, RLVR-based LLMs, while more economical LLMs can handle simpler tasks. Routing an LLM tailored to (, capability and cost) ensures usability and efficiency. We introduce LLMRec, which routes the most suitable LLM to the user query without pre-inference on the candidate LLM zoo.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs

Tzannetos, Georgios, Kamalaruban, Parameswaran, Singla, Adish

arXiv.org Artificial IntelligenceNov-5-2025

Training agents to operate under strict constraints during deployment, such as limited resource budgets or stringent safety requirements, presents significant challenges, especially when these constraints render the task complex. In this work, we propose a curriculum learning strategy that gradually tightens constraints during training, enabling the agent to incrementally master the deployment requirements. Inspired by self-paced learning techniques in unconstrained reinforcement learning (RL), our approach facilitates a smoother transition to challenging environments by initially training on simplified versions of the constraints and progressively introducing the full deployment conditions. We provide a theoretical analysis using an RL agent in a binary-tree Markov Decision Process (MDP) to demonstrate that our curriculum strategy can accelerate training relative to a baseline approach that imposes the trajectory constraints from the outset. Moreover, we empirically validate the effectiveness and generality of our method across both RL and large language model (LLM) agents in diverse settings, including a binary-tree MDP, a multi-task navigation domain, and a math reasoning task with two benchmarks. These results highlight the potential of curriculum design in enhancing the efficiency and performance of agents operating under complex trajectory constraints during deployment. Moreover, when applied to LLMs, our strategy enables compression of output chain-of-thought tokens, achieving a substantial inference speedup on consumer hardware, demonstrating its effectiveness for resource-constrained deployment.

constraint, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.0269

Genre: Research Report > New Finding (0.46)

Industry: Education > Curriculum (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 18:54:01 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper formalizes the problem of feature cross-substitution in adversarial classification and presents some approaches to solving it (one heuristic and one exact based on mixed-integer linear programming). Previous work in adversarial classification have considered simpler models for how an adversary would try to fool a classifier by replacing values of features (basically, simple distance functions) and assume that the adversary would try to minimize the distance to the real features why fooling the classifier. The paper points out that this approach has a serious shortcoming when feature selection is done. The authors then suggest that it makes more sense to consider equivalent classes of features as basically being the same features and treating them transparently in feature selection and classifier building. They show results of their two approaches (heuristic and exact) on three datasets, outperforming previous work.

adversary, cost budget, substitution, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Industry: Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.55)

Add feedback

Multi-Fidelity Hybrid Reinforcement Learning via Information Gain Maximization

Sifaou, Houssem, Simeone, Osvaldo

arXiv.org Artificial IntelligenceSep-19-2025

Optimizing a reinforcement learning (RL) policy typically requires extensive interactions with a high-fidelity simulator of the environment, which are often costly or impractical. Offline RL addresses this problem by allowing training from pre-collected data, but its effectiveness is strongly constrained by the size and quality of the dataset. Hybrid offline-online RL leverages both offline data and interactions with a single simulator of the environment. In many real-world scenarios, however, multiple simulators with varying levels of fidelity and computational cost are available. In this work, we study multi-fidelity hybrid RL for policy optimization under a fixed cost budget. We introduce multi-fidelity hybrid RL via information gain maximization (MF-HRL-IGM), a hybrid offline-online RL algorithm that implements fidelity selection based on information gain maximization through a bootstrapping approach. Theoretical analysis establishes the no-regret property of MF-HRL-IGM, while empirical evaluations demonstrate its superior performance compared to existing benchmarks.

machine learning, reinforcement learning, simulator, (14 more...)

arXiv.org Artificial Intelligence

2509.14848

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Optimal sensor deception in stochastic environments with partial observability to mislead a robot to a decoy goal

Rahmani, Hazhar, Ghosh, Mukulika, Hasnayeen, Syed Md

arXiv.org Artificial IntelligenceMar-7-2025

Deception is a common strategy adapted by autonomous systems in adversarial settings. Existing deception methods primarily focus on increasing opacity or misdirecting agents away from their goal or itinerary. In this work, we propose a deception problem aiming to mislead the robot towards a decoy goal through altering sensor events under a constrained budget of alteration. The environment along with the robot's interaction with it is modeled as a Partially Observable Markov Decision Process (POMDP), and the robot's action selection is governed by a Finite State Controller (FSC). Given a constrained budget for sensor event modifications, the objective is to compute a sensor alteration that maximizes the probability of the robot reaching a decoy goal. We establish the computational hardness of the problem by a reduction from the $0/1$ Knapsack problem and propose a Mixed Integer Linear Programming (MILP) formulation to compute optimal deception strategies. We show the efficacy of our MILP formulation via a sequence of experiments.

probability, robot, sensor alteration, (15 more...)

arXiv.org Artificial Intelligence

2503.05972

Country:

Europe > Switzerland (0.04)
North America > United States > Missouri > Greene County > Springfield (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (0.94)
Government > Regional Government > North America Government > United States Government (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Bandits with Anytime Knapsacks

Elumar, Eray Can, Tekin, Cem, Yagan, Osman

arXiv.org Artificial IntelligenceJan-30-2025

We consider bandits with anytime knapsacks (BwAK), a novel version of the BwK problem where there is an \textit{anytime} cost constraint instead of a total cost budget. This problem setting introduces additional complexities as it mandates adherence to the constraint throughout the decision-making process. We propose SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation. SUAK is an adaptive algorithm that strategically utilizes the available budget in each round in the decision-making process and skips a round when it is possible to violate the anytime cost constraint. In particular, SUAK slightly under-utilizes the available cost budget to reduce the need for skipping rounds. We show that SUAK attains the same problem-dependent regret upper bound of $ O(K \log T)$ established in prior work under the simpler BwK framework. Finally, we provide simulations to verify the utility of SUAK in practical settings.

artificial intelligence, constraint, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.1856

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.46)

Add feedback

Online Prompt and Solver Selection for Program Synthesis

Li, Yixuan, Frampton, Lewis, Mora, Federico, Polgreen, Elizabeth

arXiv.org Artificial IntelligenceJan-9-2025

Large Language Models (LLMs) demonstrate impressive capabilities in the domain of program synthesis. This level of performance is not, however, universal across all tasks, all LLMs and all prompting styles. There are many areas where one LLM dominates, one prompting style dominates, or where calling a symbolic solver is a better choice than an LLM. A key challenge for the user then, is to identify not only when an LLM is the right choice of solver, and the appropriate LLM to call for a given synthesis task, but also the right way to call it. A non-expert user who makes the wrong choice, incurs a cost both in terms of results (number of tasks solved, and the time it takes to solve them) and financial cost, if using a closed-source language model via a commercial API. We frame this choice as an online learning problem. We use a multi-armed bandit algorithm to select which symbolic solver, or LLM and prompt combination to deploy in order to maximize a given reward function (which may prioritize solving time, number of synthesis tasks solved, or financial cost of solving). We implement an instance of this approach, called CYANEA, and evaluate it on synthesis queries from the literature in ranking function synthesis, from the syntax-guided synthesis competition, and fresh, unseen queries generated from SMT problems. CYANEA solves 37.2\% more queries than the best single solver and achieves results within 4\% of the virtual best solver.

llm, query, solver, (12 more...)

arXiv.org Artificial Intelligence

2501.05247

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (0.50)

Industry: Education (0.54)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Adversarial Constrained Policy Optimization: Improving Constrained Reinforcement Learning by Adapting Budgets

Ma, Jianmina, Ji, Jingtian, Gao, Yue

arXiv.org Artificial IntelligenceOct-28-2024

Constrained reinforcement learning has achieved promising progress in safety-critical fields where both rewards and constraints are considered. However, constrained reinforcement learning methods face challenges in striking the right balance between task performance and constraint satisfaction and it is prone for them to get stuck in over-conservative or constraint violating local minima. In this paper, we propose Adversarial Constrained Policy Optimization (ACPO), which enables simultaneous optimization of reward and the adaptation of cost budgets during training. Our approach divides original constrained problem into two adversarial stages that are solved alternately, and the policy update performance of our algorithm can be theoretically guaranteed. We validate our method through experiments conducted on Safety Gymnasium and quadruped locomotion tasks. Results demonstrate that our algorithm achieves better performances compared to commonly used baselines.

budget, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2410.20786

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning

Günster, Jonas, Liu, Puze, Peters, Jan, Tateo, Davide

arXiv.org Artificial IntelligenceSep-18-2024

Safety is one of the key issues preventing the deployment of reinforcement learning techniques in real-world robots. While most approaches in the Safe Reinforcement Learning area do not require prior knowledge of constraints and robot kinematics and rely solely on data, it is often difficult to deploy them in complex real-world settings. Instead, model-based approaches that incorporate prior knowledge of the constraints and dynamics into the learning framework have proven capable of deploying the learning algorithm directly on the real robot. Unfortunately, while an approximated model of the robot dynamics is often available, the safety constraints are task-specific and hard to obtain: they may be too complicated to encode analytically, too expensive to compute, or it may be difficult to envision a priori the long-term safety requirements. In this paper, we bridge this gap by extending the safe exploration method, ATACOM, with learnable constraints, with a particular focus on ensuring long-term safety and handling of uncertainty. Our approach is competitive or superior to state-of-the-art methods in final performance while maintaining safer behavior during training.

constraint, learning, psq, (12 more...)

arXiv.org Artificial Intelligence

2409.12045

Country: Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback