Goto

Collaborating Authors

 Reinforcement Learning




benchmarks (Freeman et al., 2021) show that T A

Neural Information Processing Systems

However, various systems are inherently continuous in time, making discrete-time MDPs an inexact modeling choice. In many applications, such as greenhouse control or medical treatments, each interaction (measurement or switching of action) involves manual intervention and thus is inherently costly.






Periodic agent-state based Q-learning for POMDPs

Neural Information Processing Systems

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP . However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy.


Opponent Modeling with In-context Search

Neural Information Processing Systems

Opponent modeling is a longstanding research topic aimed at enhancing decision-making by modeling information about opponents in multi-agent environments. However, existing approaches often face challenges such as having difficulty generalizing to unknown opponent policies and conducting unstable performance.


Worst-Case Offline Reinforcement Learning with Arbitrary Data Support

Neural Information Processing Systems

We propose a method of offline reinforcement learning (RL) featuring the performance guarantee without any assumptions on the data support.