Reward is Enough for Convex MDPs
–Neural Information Processing Systems
Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many
Neural Information Processing Systems
Feb-11-2026, 09:55:54 GMT
- Country:
- Europe
- Russia (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Asia
- Russia (0.04)
- Middle East > Jordan (0.04)
- Europe