AITopics | safedice

Collaborating Authors

safedice

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations

Neural Information Processing SystemsFeb-17-2026, 19:41:33 GMT

In this paper, we present a hyperparameter-free offline safe IL algorithm, SafeDICE, that learns safe policy by leveraging the non-preferred demonstrations in the space of stationary distributions. Our algorithm directly estimates the stationary distribution corrections of the policy that imitate the demonstrations excluding the non-preferred behavior.

demonstration, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations

Neural Information Processing SystemsDec-27-2025, 03:25:34 GMT

non-preferred demonstration, offline safe imitation learning, safedice, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations

Neural Information Processing SystemsOct-9-2025, 10:57:12 GMT

demonstration, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations

Neural Information Processing SystemsJan-20-2025, 01:53:19 GMT

We consider offline safe imitation learning (IL), where the agent aims to learn the safe policy that mimics preferred behavior while avoiding non-preferred behavior from non-preferred demonstrations and unlabeled demonstrations. This problem setting corresponds to various real-world scenarios, where satisfying safety constraints is more important than maximizing the expected return. However, it is very challenging to learn the policy to avoid constraint-violating (i.e. In this paper, we present a hyperparameter-free offline safe IL algorithm, SafeDICE, that learns safe policy by leveraging the non-preferred demonstrations in the space of stationary distributions. Our algorithm directly estimates the stationary distribution corrections of the policy that imitate the demonstrations excluding the non-preferred behavior.

non-preferred demonstration, offline safe imitation learning, safedice, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.85)
Information Technology > Artificial Intelligence > Robots (0.70)

Add feedback

UNIQ: Offline Inverse Q-learning for Avoiding Undesirable Demonstrations

Hoang, Huy, Mai, Tien, Varakantham, Pradeep

arXiv.org Artificial IntelligenceOct-10-2024

We address the problem of offline learning a policy that avoids undesirable demonstrations. Unlike conventional offline imitation learning approaches that aim to imitate expert or near-optimal demonstrations, our setting involves avoiding undesirable behavior (specified using undesirable demonstrations). To tackle this problem, unlike standard imitation learning where the aim is to minimize the distance between learning policy and expert demonstrations, we formulate the learning task as maximizing a statistical distance, in the space of state-action stationary distributions, between the learning policy and the undesirable policy. This significantly different approach results in a novel training objective that necessitates a new algorithm to address it. Our algorithm, UNIQ, tackles these challenges by building on the inverse Q-learning framework, framing the learning problem as a cooperative (non-adversarial) task. We then demonstrate how to efficiently leverage unlabeled data for practical training. Our method is evaluated on standard benchmark environments, where it consistently outperforms state-of-the-art baselines. The code implementation can be accessed at: https://github.com/hmhuy0/UNIQ. Reinforcement learning (RL) is a powerful framework for learning to maximize expected returns and has achieved remarkable success across various domains. However, applying reinforcement learning to real-world problems is challenging due to difficulties in designing reward functions and the requirement for extensive online interactions with the environment. While some approaches have addressed these challenges, they often rely on costly datasets, requiring either accurate labeling or clean, consistent data, which is often impractical. Imitation learning (Abbeel & Ng, 2004; Ziebart et al., 2008; Kelly et al., 2019) offers a more feasible alternative, enabling agents to learn directly from expert demonstrations without the need for explicit reward signals.

demonstration, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2410.08307

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Singapore (0.04)

Genre:

Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.48)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback