Planning with General Objective Functions: Going Beyond Total Rewards Ruosong Wang

Neural Information Processing Systems 

Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i.e., maximize P