MetaCURL: Non-stationary Concave Utility Reinforcement Learning
–Neural Information Processing Systems
We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can be written as CURL, its non-linearity invalidates traditional Bellman equations.
Neural Information Processing Systems
Dec-27-2025, 10:37:17 GMT