AITopics | Paphos

This work addresses the problem of offline safe imitation learning (IL), where the goal is to learn safe and reward-maximizing policies from demonstrations that do not have per-timestep safety cost or reward information. In many real-world domains, online learning in the environment can be risky, and specifying accurate safety costs can be difficult. However, it is often feasible to collect trajectories that reflect undesirable or unsafe behavior, implicitly conveying what the agent should avoid. We refer to these as non-preferred trajectories. We propose a novel offline safe IL algorithm, OSIL, that infers safety from non-preferred demonstrations. We formulate safe policy learning as a Constrained Markov Decision Process (CMDP). Instead of relying on explicit safety cost and reward annotations, OSIL reformulates the CMDP problem by deriving a lower bound on reward maximizing objective and learning a cost model that estimates the likelihood of non-preferred behavior. Our approach allows agents to learn safe and reward-maximizing behavior entirely from offline demonstrations. We empirically demonstrate that our approach can learn safer policies that satisfy cost constraints without degrading the reward performance, thus outperforming several baselines.

machine learning, reinforcement learning, trajectory, (17 more...)

arXiv.org Machine Learning

2602.11018

Country:

Asia > India > Tamil Nadu > Chennai (0.04)
Europe > Middle East > Cyprus > Pafos > Paphos (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

178b6cd141e003fa5ff808c7d7d8e2cc-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 13:44:53 GMT

AStackelberg game[30, 31] is a strategic interaction between two utility-maximizing players in which one player (theleader) is able to commit to a (possibly mixed) strategy before the other player (thefollower)takesanaction.

artificial intelligence, bft, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(14 more...)

Genre: Research Report (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Optimal Tax Design in Nonatomic Congestion Games

Neural Information Processing SystemsOct-10-2025, 19:19:20 GMT

In multiplayer games, self-interested behavior among the players can harm the social welfare. Tax mechanisms are a common method to alleviate this issue and induce socially optimal behavior.

algorithm, congestion game, nash equilibrium, (16 more...)

Neural Information Processing Systems

Country: