Reward is Enough for Convex MDPs
–Neural Information Processing Systems
Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many
Neural Information Processing Systems
Aug-17-2025, 17:46:44 GMT
- Country:
- Europe
- Russia (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Asia
- Russia (0.04)
- Middle East > Jordan (0.04)
- Europe