AUnifyingViewofOptimisminEpisodic ReinforcementLearning
–Neural Information Processing Systems
A finite episodic Markov decision process (MDP) is a tuple (S,A,H,α,P,r) where S and A are the finite sets of states and actions withS = |S|,A = |A|, H is the (fixed) episode length andα is the initial state distribution.
Neural Information Processing Systems
Feb-7-2026, 11:57:10 GMT
- Country:
- North America
- Europe
- United Kingdom > England
- Greater London > London (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Finland > Uusimaa
- Helsinki (0.04)
- United Kingdom > England
- Asia > Middle East
- Jordan (0.04)