AITopics | slater

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

Neural Information Processing SystemsMar-18-2026, 04:20:35 GMT

We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints. Our goal is to design best-of-both-worlds algorithms that perform optimally under both stochastic and adversarial constraints. Previous works address this problem via primal-dual methods, and require some stringent assumptions, namely the Slater's condition, and in adversarial settings, they either assume knowledge of a lower bound on the Slater's parameter, or impose strong requirements on the primal and dual regret minimizers such as requiring weak adaptivity. We propose an alternative and more natural approach based on optimistic estimations of the constraints. Surprisingly, we show that estimating the constraints with an UCB-like approach guarantees optimal performances.Our algorithm consists of two main components: (i) a regret minimizer working on moving strategy sets and (ii) an estimate of the feasible set as an optimistic weighted empirical mean of previous samples. The key challenge in this approach is designing adaptive weights that meet the different requirements for stochastic and adversarial constraints. Our algorithm is significantly simpler than previous approaches, and has a cleaner analysis. Moreover, ours is the first best-of-both-worlds algorithm providing bounds logarithmic in the number of constraints. Additionally, in stochastic settings, it provides $\widetilde O(\sqrt{T})$ regret without Slater's condition.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

ca460332316d6da84b08b9bcf39b687b-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 04:11:42 GMT

artificial intelligence, constraint, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

4beaed6a33716fcfe7b5250d10520eb9-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 19:15:24 GMT

equation, iteration, value function, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Global Solutions to Non-Convex Functional Constrained Problems with Hidden Convexity

Fatkhullin, Ilyas, He, Niao, Lan, Guanghui, Wolf, Florian

arXiv.org Artificial IntelligenceNov-14-2025

Constrained non-convex optimization is fundamentally challenging, as global solutions are generally intractable and constraint qualifications may not hold. However, in many applications, including safe policy optimization in control and reinforcement learning, such problems possess hidden convexity, meaning they can be reformulated as convex programs via a nonlinear invertible transformation. Typically such transformations are implicit or unknown, making the direct link with the convex program impossible. On the other hand, (sub-)gradients with respect to the original variables are often accessible or can be easily estimated, which motivates algorithms that operate directly in the original (non-convex) problem space using standard (sub-)gradient oracles. In this work, we develop the first algorithms to provably solve such non-convex problems to global minima. First, using a modified inexact proximal point method, we establish global last-iterate convergence guarantees with $\widetilde{\mathcal{O}}(\varepsilon^{-3})$ oracle complexity in non-smooth setting. For smooth problems, we propose a new bundle-level type method based on linearly constrained quadratic subproblems, improving the oracle complexity to $\widetilde{\mathcal{O}}(\varepsilon^{-1})$. Surprisingly, despite non-convexity, our methodology does not require any constraint qualifications, can handle hidden convex equality constraints, and achieves complexities matching those for solving unconstrained hidden convex optimization.

artificial intelligence, cit, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.10626

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)

Add feedback

A Self-Improving Architecture for Dynamic Safety in Large Language Models

Slater, Tyler

arXiv.org Artificial IntelligenceNov-12-2025

Context: The integration of Large Language Models (LLMs) into core software systems is accelerating. However, existing software architecture patterns are static, while current safety assurance methods are not scalable, leaving systems vulnerable to novel adversarial threats. Objective: To design, implement, and evaluate a novel software architecture that enables an AI-driven system to autonomously and continuously adapt its own safety protocols at runtime. Method: We propose the Self-Improving Safety Framework (SISF), a runtime architecture that couples an unprotected, unaligned base LLM (mistralai/Mistral-7B-v0.1) with a dynamic feedback loop. This loop consists of an AI Adjudicator (GPT-4o) for breach detection and a Policy Synthesis Module (GPT-4 Turbo) that autonomously generates new, generalized safety policies (both heuristic and semantic) in response to failures. Results: We conducted a dynamic learning evaluation using the 520-prompt AdvBench dataset. The unprotected model was 100% vulnerable. Our SISF, starting from zero policies, demonstrated a clear learning curve: it detected 237 breaches, autonomously synthesized 234 new policies, and reduced the overall Attack Success Rate (ASR) to 45.58%. In a subsequent test on 520 benign prompts, the SISF achieved a 0.00% False Positive Rate (FPR), proving its ability to adapt without compromising user utility. Conclusion: An architectural approach to AI safety, based on the principles of self-adaptation, is a viable and effective strategy. Our framework demonstrates a practical path towards building more robust, resilient, and scalable AI-driven systems, shifting safety assurance from a static, pre-deployment activity to an automated, runtime process.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.07645

Genre: Research Report > New Finding (0.69)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Beyond Slater's Condition in Online CMDPs with Stochastic and Adversarial Constraints

Stradi, Francesco Emanuele, Chiefari, Eleonora Fidelia, Castiglioni, Matteo, Marchesi, Alberto, Gatti, Nicola

arXiv.org Artificial IntelligenceOct-3-2025

We study \emph{online episodic Constrained Markov Decision Processes} (CMDPs) under both stochastic and adversarial constraints. We provide a novel algorithm whose guarantees greatly improve those of the state-of-the-art best-of-both-worlds algorithm introduced by Stradi et al. (2025). In the stochastic regime, \emph{i.e.}, when the constraints are sampled from fixed but unknown distributions, our method achieves $\widetilde{\mathcal{O}}(\sqrt{T})$ regret and constraint violation without relying on Slater's condition, thereby handling settings where no strictly feasible solution exists. Moreover, we provide guarantees on the stronger notion of \emph{positive} constraint violation, which does not allow to recover from large violation in the early episodes by playing strictly safe policies. In the adversarial regime, \emph{i.e.}, when the constraints may change arbitrarily between episodes, our algorithm ensures sublinear constraint violation without Slater's condition, and achieves sublinear $α$-regret with respect to the \emph{unconstrained} optimum, where $α$ is a suitably defined multiplicative approximation factor. We further validate our results through synthetic experiments, showing the practical effectiveness of our algorithm.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.20114

Genre: Research Report > New Finding (0.34)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Neural Information Processing SystemsAug-17-2025, 09:03:52 GMT

This paper considers stochastic linear bandits with general nonlinear constraints.

artificial intelligence, constraint, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Pseudo-Code for Algorithms Algorithm 2 Value Iteration (with Min-Max Oracle) Inputs: S, X, Y,r, g,p,, T Outputs: v (T) 1: Initialize v (0) arbitrarily, e.g. v

Neural Information Processing SystemsAug-14-2025, 16:52:37 GMT

X, Y are compact sets. Assumption 1.1, C is a contraction mapping w.r .t. to the sup norm k . By combining Theorem 2.3 and the Banach fixed point theorem [57], we First note that by Assumption 1.1, we have that Suppose that Assumption 1.1 holds, and that 1. for all Suppose that Assumption 1.1 holds, and that 1. for all By Theorem 2.1 and Theorem 2.3 we know that The proof follows exactly the same, albeit the optimality conditions on the inner player's policy By Theorem 3.1, the necessary optimality conditions for the stochastic Stackelberg game are that for all states This reduced the market to a deterministic repeated market setting in which the amount of budget saved by the buyers differentiates different states of the market. As a result, we had to use fitted variant of value iteration. This time, we implemented a stochastic transitions.

equation, iteration, value function, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Game Theory (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

Neural Information Processing SystemsMay-26-2025, 16:23:57 GMT

We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints. Our goal is to design best-of-both-worlds algorithms that perform optimally under both stochastic and adversarial constraints. Previous works address this problem via primal-dual methods, and require some stringent assumptions, namely the Slater's condition, and in adversarial settings, they either assume knowledge of a lower bound on the Slater's parameter, or impose strong requirements on the primal and dual regret minimizers such as requiring weak adaptivity. We propose an alternative and more natural approach based on optimistic estimations of the constraints. Surprisingly, we show that estimating the constraints with an UCB-like approach guarantees optimal performances.Our algorithm consists of two main components: (i) a regret minimizer working on moving strategy sets and (ii) an estimate of the feasible set as an optimistic weighted empirical mean of previous samples.

artificial intelligence, constraint, machine learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Reviews: Constrained Reinforcement Learning Has Zero Duality Gap

Neural Information Processing SystemsJan-26-2025, 21:16:33 GMT

The paper studies a form of constrained reinforcement learning in which the constraints are bounds on the value functions for auxiliary rewards. This allows a more expressive formulation than the common approach of defining the reward as a linear combination of multiple objectives. The authors show that under certain conditions, the constraint optimization problem has zero duality gap, implying that a solution can be found by solving the dual optimization problem, which is convex. The authors also extend this analysis to the case for which the policy is parameterized. Theorem 1 assumes that Slater's condition holds, which is problematic for two reasons. Slater's condition is usually defined for convex constraints, but the authors specifically state that the constraints in PI are non-convex.

constrained reinforcement learning, constraint, slater, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

Add feedback

Filters

Collaborating Authors

slater

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

ca460332316d6da84b08b9bcf39b687b-Paper.pdf

4beaed6a33716fcfe7b5250d10520eb9-Supplemental-Conference.pdf

Global Solutions to Non-Convex Functional Constrained Problems with Hidden Convexity

A Self-Improving Architecture for Dynamic Safety in Large Language Models

Beyond Slater's Condition in Online CMDPs with Stochastic and Adversarial Constraints

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

A Pseudo-Code for Algorithms Algorithm 2 Value Iteration (with Min-Max Oracle) Inputs: S, X, Y,r, g,p,, T Outputs: v (T) 1: Initialize v (0) arbitrarily, e.g. v

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

Reviews: Constrained Reinforcement Learning Has Zero Duality Gap