AITopics | regret and constraint violation

Taming Adversarial Constraints in CMDPs

Neural Information Processing SystemsJun-14-2026, 17:46:29 GMT

In constrained MDPs (CMDPs) with adversarial rewards and constraints, a known impossibility result prevents any algorithm from attaining sublinear regret and constraint violation, when competing against a best-in-hindsight policy that satisfies the constraints on average. In this paper, we show how to ease such a negative result, by considering settings that generalize both stochastic CMDPs and adversarial ones. We provide algorithms whose performances smoothly degrade as the level of environment adverseness increases. Specifically, they attain eO( T +C) regret and positive constraint violation under bandit feedback, where C measures the adverseness of rewards and constraints. This is C = Θ(T) in the worst case, coherently with the impossibility result for adversarial CMDPs. First, we design an algorithm with the desired guarantees when C is known. Then, in the case C is unknown, we obtain the same results by embedding multiple instances of such an algorithm in a general meta-procedure, which suitably selects them so as to balance the trade-off between regret and constraint violation.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.47)
Information Technology (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Taming Adversarial Constraints in CMDPs

Neural Information Processing SystemsJun-10-2026, 07:26:25 GMT

In constrained MDPs (CMDPs) with adversarial rewards and constraints, a known impossibility result prevents any algorithm from attaining sublinear regret and constraint violation, when competing against a best-in-hindsight policy that satisfies the constraints on average. In this paper, we show how to ease such a negative result, by considering settings that generalize both stochastic CMDPs and adversarial ones. We provide algorithms whose performances smoothly degrade as the level of environment adverseness increases. In this paper, we show that this negative result can be eased in CMDPs with non-stationary rewards and constraints, by providing algorithms whose performances smoothly degrade as non-stationarity increases. Specifically, they attain $\widetilde{\mathcal{O}} (\sqrt{T} + C)$ regret and positive constraint violation under bandit feedback, where $C$ measures the adverseness of rewards and constraints. This is $C = \Theta(T)$ in the worst case, coherently with the impossibility result for adversarial CMDPs. First, we design an algorithm with the desired guarantees when $C$ is known. Then, in the case $C$ is unknown, we obtain the same results by embedding multiple instances of such an algorithm in a general meta-procedure, which suitably selects them so as to balance the trade-off between regret and constraint violation.

algorithm, artificial intelligence, proceedings, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.86)

Add feedback

00295cede6e1600d344b5cd6d9fd4640-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 07:15:16 GMT

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.46)

Industry: Energy (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Online Convex Optimization with Stochastic Constraints

Neural Information Processing SystemsMar-17-2026, 17:49:03 GMT

This paper considers online convex optimization (OCO) with stochastic constraints, which generalizes Zinkevich's OCO over a known simple fixed set by introducing multiple stochastic functional constraints that are i.i.d.

artificial intelligence, constraint-based reasoning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.49)

Add feedback

Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm

Neural Information Processing SystemsFeb-18-2026, 00:40:33 GMT

This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDPs).

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

ca460332316d6da84b08b9bcf39b687b-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 04:11:42 GMT

artificial intelligence, constraint, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

ae95296e27d7f695f891cd26b4f37078-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 20:14:59 GMT

arxiv preprint arxiv, constraint, probability, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

ProvablyEfficientModel-FreeConstrainedRLwith LinearFunctionApproximation

Neural Information Processing SystemsFeb-9-2026, 02:31:23 GMT

We study the constrained reinforcement learning problem, in which an agent aims tomaximize the expected cumulativereward subject toaconstraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a'simulator', we aim to develop thefirst model-free, simulator-freealgorithm that achieves a sublinear regret and a sublinear constraint violation even inlarge-scale systems.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Michigan > Wayne County > Detroit (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

ProvablyEfficientModel-FreeConstrainedRLwith LinearFunctionApproximation

Neural Information Processing SystemsFeb-9-2026, 02:31:19 GMT

We study the constrained reinforcement learning problem, in which an agent aims tomaximize the expected cumulativereward subject toaconstraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a'simulator', we aim to develop thefirst model-free, simulator-freealgorithm that achieves a sublinear regret and a sublinear constraint violation even inlarge-scale systems.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: