Planning with General Objective Functions: Going Beyond Total Rewards Ruosong Wang
–Neural Information Processing Systems
Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i.e., maximize P
Neural Information Processing Systems
May-31-2025, 04:47:42 GMT