AITopics | Reinforcement Learning

This work addresses the problem of offline safe imitation learning (IL), where the goal is to learn safe and reward-maximizing policies from demonstrations that do not have per-timestep safety cost or reward information. In many real-world domains, online learning in the environment can be risky, and specifying accurate safety costs can be difficult. However, it is often feasible to collect trajectories that reflect undesirable or unsafe behavior, implicitly conveying what the agent should avoid. We refer to these as non-preferred trajectories. We propose a novel offline safe IL algorithm, OSIL, that infers safety from non-preferred demonstrations. We formulate safe policy learning as a Constrained Markov Decision Process (CMDP). Instead of relying on explicit safety cost and reward annotations, OSIL reformulates the CMDP problem by deriving a lower bound on reward maximizing objective and learning a cost model that estimates the likelihood of non-preferred behavior. Our approach allows agents to learn safe and reward-maximizing behavior entirely from offline demonstrations. We empirically demonstrate that our approach can learn safer policies that satisfy cost constraints without degrading the reward performance, thus outperforming several baselines.

machine learning, reinforcement learning, trajectory, (17 more...)

arXiv.org Machine Learning

2602.11018

Country:

Asia > India > Tamil Nadu > Chennai (0.04)
Europe > Middle East > Cyprus > Pafos > Paphos (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

f7e2b2b75b04175610e5a00c1e221ebb-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 23:47:44 GMT

agent, international conference, reinforcement, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Pennsylvania (0.04)
(2 more...)

Genre: Research Report (0.94)

Industry:

Government (0.68)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.30)

Add feedback

ModelingHumanExplorationThrough Resource-RationalReinforcementLearning

Neural Information Processing SystemsFeb-11-2026, 23:46:18 GMT

Knowing how to efficiently balance between exploring unfamiliar parts of an environment and exploiting currently available knowledge is an essential ingredient of anyintelligent organism.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.05)

Genre: Research Report > New Finding (0.94)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints

Sebastian Tschiatschek, Ahana Ghosh, Luis Haug, Rati Devidze, Adish Singla

Neural Information Processing SystemsFeb-11-2026, 23:37:59 GMT

In this paper, we consider the setting where the learner has its own preferences that it additionally takesintoconsideration.

learner, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle

Simon S. Du, Yuping Luo, Ruosong Wang, Hanrui Zhang

Neural Information Processing SystemsFeb-11-2026, 23:36:43 GMT

The24], which Q-learning exploration Q-function Q-function asymptotically 39] derived drawbackof example, Zou39] require lowerbounded properties.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)

Add feedback

f6f154417c4665861583f9b9c4afafa2-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 23:21:34 GMT

Ourapproach, Exploration through Learned Language Abstraction (ELLA) providesintermediate rewards to an agent for completing relevant low-level behaviors as it tries to solve a complex, sparse rewardtask.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback