AITopics | inverse rl

a97da629b098b75c294dffdc3e463904-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 18:05:30 GMT

agent, experiment, reward function, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

a97da629b098b75c294dffdc3e463904-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 18:05:23 GMT

Relabelingmethods typically pose the question: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal?

inverse rl, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Artificial Intelligence > Robots (0.69)

Add feedback

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

Neural Information Processing SystemsDec-24-2025, 10:18:01 GMT

Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically pose the question: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal?

hindsight inference, name change, rewriting history, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

a97da629b098b75c294dffdc3e463904-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 16:31:25 GMT

agent, experiment, reward function, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement Benjamin Eysenbach

Neural Information Processing SystemsAug-15-2025, 16:31:18 GMT

Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency.

inverse rl, reward function, trajectory, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

Neural Information Processing SystemsMay-27-2025, 08:31:48 GMT

Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically pose the question: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper we show that inverse RL is a principled mechanism for reusing experience across tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary types of reward functions.

hindsight inference, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

Neural Information Processing SystemsOct-11-2024, 01:23:21 GMT

Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically pose the question: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper we show that inverse RL is a principled mechanism for reusing experience across tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary types of reward functions.

hindsight inference, policy improvement, rewriting history, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

How Private Is Your RL Policy? An Inverse RL Based Analysis Framework

Prakash, Kritika, Husain, Fiza, Paruchuri, Praveen, Gujar, Sujit P.

arXiv.org Artificial IntelligenceDec-10-2021

Reinforcement Learning (RL) enables agents to learn how to perform various tasks from scratch. In domains like autonomous driving, recommendation systems, and more, optimal RL policies learned could cause a privacy breach if the policies memorize any part of the private reward. We study the set of existing differentially-private RL policies derived from various RL algorithms such as Value Iteration, Deep Q Networks, and Vanilla Proximal Policy Optimization. We propose a new Privacy-Aware Inverse RL (PRIL) analysis framework, that performs reward reconstruction as an adversarial attack on private policies that the agents may deploy. For this, we introduce the reward reconstruction attack, wherein we seek to reconstruct the original reward from a privacy-preserving policy using an Inverse RL algorithm. An adversary must do poorly at reconstructing the original reward function if the agent uses a tightly private policy. Using this framework, we empirically test the effectiveness of the privacy guarantee offered by the private algorithms on multiple instances of the FrozenLake domain of varying complexities. Based on the analysis performed, we infer a gap between the current standard of privacy offered and the standard of privacy needed to protect reward functions in RL. We do so by quantifying the extent to which each private policy protects the reward function by measuring distances between the original and reconstructed rewards.

algorithm, privacy, reward function, (13 more...)

arXiv.org Artificial Intelligence

2112.05495

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

Eysenbach, Benjamin, Geng, Xinyang, Levine, Sergey, Salakhutdinov, Ruslan

arXiv.org Artificial IntelligenceFeb-25-2020

Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically ask: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper, we show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary classes of tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings, including goal-reaching, domains with discrete sets of rewards, and those with linear reward functions.

inverse rl, reward function, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2002.11089

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Adversarial recovery of agent rewards from latent spaces of the limit order book

Roa-Vicens, Jacobo, Wang, Yuanbo, Mison, Virgile, Gal, Yarin, Silva, Ricardo

arXiv.org Machine LearningDec-9-2019

Inverse reinforcement learning has proved its ability to explain state-action trajectories of expert agents by recovering their underlying reward functions in increasingly challenging environments. Recent advances in adversarial learning have allowed extending inverse RL to applications with non-stationary environment dynamics unknown to the agents, arbitrary structures of reward functions and improved handling of the ambiguities inherent to the ill-posed nature of inverse RL. This is particularly relevant in real time applications on stochastic environments involving risk, like volatile financial markets. Moreover, recent work on simulation of complex environments enable learning algorithms to engage with real market data through simulations of its latent space representations, avoiding a costly exploration of the original environment. In this paper, we explore whether adversarial inverse RL algorithms can be adapted and trained within such latent space simulations from real market data, while maintaining their ability to recover agent rewards robust to variations in the underlying dynamics, and transfer them to new regimes of the original environment.

agent, representation, reward function, (14 more...)

arXiv.org Machine Learning

1912.04242

Country: