Goto

Collaborating Authors

 Reinforcement Learning


OfflineReinforcementLearningwithDifferential Privacy

Neural Information Processing Systems

Since offline RL does not require access to the environment, it can be applied to problems where interaction with environment is infeasible,e.g., when collecting new data is costly (trade or finance [Zhang et al., 2020]), risky (autonomous driving [Sallab et al., 2017]) or illegal / unethical (healthcare [Raghu etal.,2017]).






Dynamic Regret of Adversarial Linear Mixture MDPs

Neural Information Processing Systems

We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-information rewards and the unknown transition kernel. We consider the linear mixture MDPs whose transition kernel is a linear mixture model and choose the dynamic regret as the performance measure.



The Value of Reward Lookahead in Reinforcement Learning

Neural Information Processing Systems

In reinforcement learning (RL), agents sequentially interact with changing environments while aiming to maximize the obtained rewards.